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Abstract 

Biological  activity  has  shaped  the  surface  of  the  earth  in  numerous  ways,  but  life’s  most 
pervasive  and  persistent  global  impaet  has  been  the  secular  oxidation  of  the  surface 
environment.  Through  primary  production  -  the  biochemical  reduction  of  carbon  dioxide 
to  synthesize  biomass  -  large  amounts  of  oxidants  such  as  molecular  oxygen,  sulfate  and 
ferric  iron  have  accumulated  in  the  ocean,  atmosphere  and  crust,  fundamentally  altering 
the  chemical  environment  of  the  earth’s  surface.  This  thesis  addresses  aspects  of  the  role 
of  marine  microorganisms  in  driving  this  process.  In  the  first  section  of  the  thesis, 
biomarkers  (hydrocarbon  molecular  fossils)  are  used  to  investigate  the  early  history  of 
microbial  diversity  and  biogeochemistry.  Molecular  fossils  from  the  Transvaal 
Supergroup,  South  Africa,  document  the  presence  in  the  oceans  of  a  diverse  microbiota, 
including  eukaryotes,  as  well  as  oxygenic  photosynthesis  and  aerobic  biochemistry,  by 
ca.  2.7Ga.  Experimental  study  of  the  oxygen  requirements  of  steroid  biosynthesis 
suggests  that  sterane  biomarkers  in  late  Archean  rocks  are  consistent  with  the  persistence 
of  microaerobic  surface  oeean  environments  long  before  the  initial  oxygenation  of  the 
atmosphere.  In  the  second  part,  using  Prochlorococcus  (a  marine  cyanobacterium  that  is 
the  most  abundant  primary  producer  on  earth  today)  as  a  model  system,  we  explored  how 
microbes  use  the  limited  nutrient  resources  available  in  the  marine  environment  to  make 
the  protein  eatalysts  that  enable  primary  production.  Quantification  of  the 
Prochlorococcus  proteome  over  the  diel  cell-division  cycle  reveals  that  protein 
abundances  are  distinct  from  transcript-level  dynamics,  and  that  small  temporal  shifts  in 
enzyme  levels  can  redirect  metabolic  fluxes.  This  thesis  illustrates  how  molecular 
techniques  can  contribute  to  a  systems-level  understanding  of  biogeoehemical  proeesses, 
which  will  aid  in  reconstructing  the  past  of,  and  predicting  future  change  in,  earth  surface 
environments. 
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Introduction 


The  coevolution  of  life  and  earth  surface  environments  is  central  to  the  history  and 
functioning  of  our  planet.  Life  and  its  habitats  coevolve  through  a  wide  array  of 
reciprocal  interactions,  constantly  applying  pressures  to  and  driving  changes  in  one 
another.  Only  life  evolves  in  a  Darwinian  sense  and  has  a  distinct  mechanism  for 
heritable  transmission  of  information.  But  the  impact  of  biology  on  the  geochemistry  of 
the  oceans,  atmosphere  and  crust  is  so  pervasive  that  the  present  distributions  of  even 
inorganic  chemicals  are  inexplicable  without  acknowledging  life’s  influence.  When  the 
changes  in  geochemical  distributions  over  time,  evident  in  the  rock  record,  are  taken  into 
account,  biology’s  role  appears  even  more  central. 

The  most  enduring  and  globally  significant  change  that  life  has  driven  over  the  course  of 
earth  history  has  been  the  progressive  oxidation  of  the  surface  environment.  As  a  result 
of  biological  activity,  particularly  photosynthesis,  oxidizing  power  has  been  continuously 
exported  by  the  carbon  cycle  and  has  accumulated  in  reservoirs  such  as  molecular 
oxygen,  sulfate,  and  ferric  iron.  This  process  has  increased  redox  disequilibrium  between 
the  surface  environment  (including  the  atmosphere,  oceans  and  upper  crust)  and  the 
deeper  solid  earth,  with  broad  consequences  for  the  cycling  of  nearly  all  elements. 
Progressive  oxidation  has  also  been  a  defining  control  on  the  ecological  distribution  of 
microbes  and  on  the  biogeochemistry  that  results  from  their  activities. 

How  DID  LIFE  OXIDIZE  THE  EARTH’S  SUREACE? 

The  secular  oxidation  of  the  surface  environment  has  been  driven  in  large  part  by  primary 
production.  Life,  of  course,  does  not  create  or  destroy  electrons;  “secular  oxidation” 
means  the  redistribution  of  reducing  equivalents,  principally  from  a  variety  of  electron- 
donating  species  to  carbon.  (The  only  recognized  process  that  truly  changes  the  redox 
state  of  the  planet  as  a  whole  is  the  escape  of  hydrogen  to  space,  which,  while  in  itself  a 
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purely  physical  phenomenon,  has  also  been  strongly  influenced  by  biology  (Catling  et  ah, 
2001).)  In  the  process,  oxidized  species  such  as  O2,  Fe(III)  and  SO/'  accumulate,  while 
carbon  undergoes  net  reduction.  It  is  in  molecular  action  of  the  biochemical  machinery 
of  autotrophy  that  the  sum  total  of  this  redistribution  takes  place.  The  other  various  and 
complex  components  of  the  carbon  cycle  -  respiration,  burial,  weathering,  subduction 
and  so  on  -  act  to  reverse  autotrophs’  accomplishments  and  make  the  net  redistribution 
only  a  tiny  fraction  of  the  gross. 

The  planetary-scale  workings  of  the  carbon  cycle  as  a  whole  have  determined  the  rate  of 
secular  oxidation.  But  what  controls  the  gross  rate  of  autotrophic  transfer  of  electrons 
from  donors  to  carbon?  Most  basically,  this  process  requires  a  source  of  carbon  and  a 
suitable  source  of  reducing  power.  Carbon  has  generally  not  been  a  limiting  nutrient  for 

1  Q 

primary  production  over  earth  history,  as  attested  by  the  consistent  C  depletion  of 
sedimentary  organic  carbon  below  that  of  marine  carbonates,  which  suggests  that 
enzymatic  discrimination  between  C  isotopes  has  generally  been  expressed  (in  contrast, 
for  example,  to  the  situation  for  S  isotopes  (Canfield,  2001).  Notable  exceptions  include 
photosynthesis  by  some  terrestrial  and  aquatic  C3  plants  in  the  modern,  I0W-CO2 
atmosphere  (Wullschleger,  1993)  and  certain  geochemically  extraordinary  habitats  such 
as  the  Lost  City  hydrothermal  system  (Bradley  et  al.,  2009).  The  former  is  a  geologically 
recent  phenomenon  of  the  late  Cenozoic,  while  the  latter  is  a  spatially-constrained 
phenomenon.  During  the  “snowball  earth”  episodes  of  the  Neoproterozoic,  marine 
primary  producers  may  also  have  been  limited  by  some  combination  of  carbon-  and  light- 
starvation  due  to  permanent  sea  ice  cover,  which  may  have  influenced  the  evolution  of 
carbon  concentrating  mechanisms  (Riding,  2006).  For  the  vast  majority  of  times  and 
places  in  earth  history,  though,  autotrophs  in  sunlit  aquatic  habitats  have  not  been  carbon- 
limited. 

Primary  electron  donors  have  a  greater  potential  to  limit  primary  production,  though 
whether  they  have  truly  done  so  on  a  global  scale  is  unclear.  Supplies  of  reductants  such 


12 


as  sulfide,  ferrous  iron  and  molecular  hydrogen  are  ultimately  dependent  on  volcanism 
and  rock  weathering  and  hardly  ever  approach  the  availability  of  inorganic  carbon.  The 
invention  of  oxygenic  photosynthesis,  which  uses  water  as  an  electron  donor,  might  have 
relieved  reductant  limitation;  if  it  did,  oxygenic  primary  producers  ought  to  have 
proliferated  rapidly.  Water  is  in  unlimited  supply  in  aquatic  environments,  and  has  only 
become  growth-limiting  in  geologically-recent  periods  as  vascular  plants  have  colonized 
drier  parts  of  the  land  surface.  Examining  the  pace  and  character  of  the  global  ecological 
succession  from  anoxygenic  to  oxygenic  primary  production  would  tell  us  much  about 
the  limitations  faced,  and  the  adaptive  strategies  employed,  by  photoautotrophs. 

When  -  and  how  quickly  -  did  oxygenic  photosynthesis  arise? 

Knowing  when  life  developed  the  ability  to  produce  O2  would  provide  a  valuable 
landmark  for  reconstructions  of  biological  diversity  and  environmental  conditions  over 
earth  history.  In  particular,  assessing  the  relative  timing  of  the  origin  of  oxygenic 
photosynthesis  and  the  oxygenation  of  various  parts  of  the  surface  environment  (the 
atmosphere,  surface  and  deep  oceans,  pore  waters,  etc.)  would  lend  insight  into  the 
pacing  and  feedbacks  involved  in  coevolutionary  processes.  Chapters  2  and  3  of  this 
thesis  address  the  timing  and  biogeochemical  consequences  of  the  evolution  of  oxygenic 
photosynthesis  using  biomarkers  -  hydrocarbon  molecular  fossils  of  membrane  lipids 
produced  by  microorganisms  and  preserved  in  sedimentary  rocks. 

Chapter  2  presents  findings  of  a  molecular  fossil  investigation  of  late  Archean  marine 
sedimentary  rocks  from  the  Transvaal  Supergroup,  South  Africa  dating  from  ca.  2.67- 
2.46Ga.  Previous  molecular  fossil  studies  had  been  carried  out  on  rocks  of  similar  age 
from  Western  Australia  (Brocks  et  ah,  1999;  Brocks  et  ah,  2003;  Eigenbrode  et  ah, 

2008),  and  documented  a  diverse  microbiota,  including  the  oldest  molecular  evidence  for 
oxygenic  cyanobacteria.  However,  those  findings  have  remained  controversial 
(Rasmussen  et  ah,  2008),  due  in  part  to  the  rock  samples  having  been  recovered  many 
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years  earlier  as  part  of  a  resource-exploration  drilling  project,  and  thus  potentially 
compromised  by  hydrocarbon  contamination  during  core  recovery  and/or  storage.  The 
work  presented  here  is  the  first  to  have  been  conducted  on  freshly-extracted  core  material 
that  had  been  drilled  without  the  usual  hydrocarbon  lubricants  for  the  express  purpose  of 
organic  geochemical  investigation.  Furthermore,  exceptional  care  was  taken  and  novel 
analytical  methods  developed  (described  in  Appendix  B)  to  ensure  that  syngenetic 
molecular  fossils  were  not  compromised  or  overprinted  with  exogenous  contaminants. 

To  the  first-order  question,  is  there  reliable  evidence  of  oxygenic  photosynthesis  in 
late  Archean  sedimentary  rocks?,  the  answer  is  clearly  affirmative.  Multiple  types  of 
molecular  fossils  found  in  the  Transvaal  rocks  are  indicative  of  both  the  presence  of 
cyanobacteria  and  the  utilization  of  molecular  oxygen  by  other  microorganisms  in  the 
community.  This  result,  in  accord  with  other  recently-described  lines  of  evidence  (Anbar 
et  ah,  2007;  Bosak  et  ah,  2009;  Frei  et  ah,  2009),  suggests  that  oxygen  production  was 
occurring  in  some  marine  ecosystems  at  least  250-300  million  years  before  the  first  signs 
of  widespread  atmospheric  oxygenation.  Beyond  aerobiosis,  what  types  of  organisms 
and  metabolisms  were  present  in  the  ocean  before  the  oxygenation  of  the 
atmosphere?  The  molecular  fossil  record  documents  a  broad  taxonomic  range  of 
microbes  -  including  Eubacteria,  Archaea,  and  Eukaryotes  -  implying  that  the  three 
Domains  of  cellular  life  had  all  originated  by  the  late  Archean.  The  broader  evidence  for 
active  biogeochemical  cycling  of  most  elements  also  suggests  that  much  of  higher- level 
microbial  diversification  had  taken  place  by  this  time. 

Chapter  3  of  this  thesis  addresses  the  biogeochemical  implications  of  one  class  of 
molecular  fossils:  the  steranes.  These  hydrocarbons  are  the  diagenetic  products  of 
steroids,  and  finding  them  in  late  Archean  rocks  is  particularly  significant  because  steroid 
biosynthesis  requires  molecular  oxygen.  Thus  their  presence  in  marine  sediments  of  this 
era  implies  that  dissolved  O2  was  available  in  biochemically  significant  concentrations  in 
at  least  some  parts  of  the  ocean.  Simultaneously,  multiple  geochemical  proxies  suggest 
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an  atmosphere  with  an  O2  content  <10'^  of  the  present  level  until  ca.  2.4Ga.  Are 
prevailing  interpretations  of  the  molecular  fossil  and  geochemical  proxies  for 
oxygenation  mutually  inconsistent?  To  answer  this  question,  we  investigated  the 
oxygen  requirements  of  de  novo  steroid  biosynthesis.  By  growing  the  facultatively 
anaerobic  eukaryote  Saccharomyces  cerevisiae  under  well-defined,  microaerobic 
conditions  and  tracking  the  incorporation  of  isotopically-labeled  carbon  into  sterols,  we 
found  that  steroid  biosynthesis  is  enabled  by  oxygen  concentrations  as  low  as  nanomolar. 
Using  constraints  from  Precambrian  atmospheric  evolution  models,  we  show  that  such 
microaerobic  regions  of  the  surface  ocean  could  have  been  persistent  features  long  before 
the  oxygenation  of  the  atmosphere. 

Taken  together,  these  results  contribute  to  understanding  of  the  nature  and  tempo  of 
biogeochemical  change  in  the  early  Precambrian.  Rather  than  the  evolution  of 
cyanobacteria  suddenly  relieving  global  reductant  limitation  and  driving  rapid 
oxygenation  of  the  atmosphere  and  oceans,  the  process  appears  to  have  been  much  more 
gradual.  The  origination  of  cyanobacteria  (along  with  most  other  high-level  microbial 
diversity)  before  2.7Ga  was  followed  by  a  extended  period  of  close  coexistence  between 
oxygenic  and  anoxygenic  primary  producers  (Johnston  et  ah,  2009). 

How  DO  PRIMARY  PRODUCERS  USE  NUTRIENTS? 

If  carbon  has  not  been  growth  limiting  until  perhaps  the  late  Cenozoic,  and  reductants 
have  not  been  limiting  even  since  before  the  invention  of  oxygenic  photosynthesis  (and 
certainly  not  afterwards),  what  has  limited  primary  production  over  most  of  earth  history? 
With  both  reactants  (inorganic  carbon  and  electron  donors)  in  abundant  supply  but  the 
inherent  kinetics  terribly  slow,  the  rate  of  reaction  is  limited  by  the  availability  and 
activity  of  catalyst.  Life  provides  catalysts  for  autotrophy  in  the  form  of  protein-based 
enzymes.  These  enzymes  are  made  of  nitrogen-rich  amino  acids,  are  encoded  and 
expressed  by  phosphate-rich  nucleic  acids,  and  require  a  wide  variety  of  metals  as 
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cofactors.  Thus  the  kinetic  limitation  of  the  gross  rate  of  carbon  fixation  is  also  limitation 
by  nutrients. 

The  second  part  of  this  thesis  explores  how  a  particular  marine  primary  producer, 
Prochlorococcus,  utilizes  nutrients  to  make  the  biochemicals  it  needs  to  grow.  A 
cyanobacterium  abundant  in  the  tropical  and  subtropical  epipelagic  ocean, 
Prochlorococcus  contributes  significantly  to  global  primary  production  while  living  in 
one  of  the  most  nutrient-deplete  habitats  on  earth  (Partensky  et  ah,  1999;  Coleman  and 
Chisholm,  2007).  How  it  accomplishes  this  feat  is  intriguing  from  a  metabolic 
biochemistry  standpoint  and  central  to  our  understanding  of  the  functioning  of  nutrient 
limited  ecosystems  in  the  past  and  present. 

The  time  scale  of  this  investigation  is  rather  shorter  than  the  geologic  spans  considered  in 
Chapters  2  and  3:  we  focus  on  the  dynamics  of  gene  expression  over  the  24- hour  diel 
cycle,  to  which  cell  division  in  Prochlorococcus  is  closely  synchronized  (Vaulot  et  ah, 
1995).  Chapter  4  presents  results  of  a  proteomic,  transcriptomic  and  regulatory 
investigation  of  the  diel  cell  cycle.  In  particular,  we  sought  to  address  the  question,  what 
are  the  differences  in,  and  controls  on,  gene  expression  patterns  between  mRNA  and 
proteins  during  the  diel  cell  cycle?  We  found  that  temporal  variations  in  protein 
abundance  are  substantially  smaller  than  the  oscillations  in  their  respective  transcripts 
(Zinser  et  ah,  2009).  Physiologically-essential  shifts  in  metabolic  fluxes  occurring  over 
the  diel  cycle  appear  to  be  driven  by  fairly  small  changes  in  the  relative  abundance  of 
enzymes.  This  includes  central  pathways  of  carbon  fixation  and  respiration,  suggesting 
that  Prochlorococcus'  role  as  a  primary  producer  hinges  on  precise  control  of  a  metabolic 
network  poised  near  a  flux  balancing  point. 

The  whole-cell,  quantitative  portrait  of  gene  expression  generated  in  this  experiment  also 
enabled  us  to  ask,  which  proteins  constitute  the  greatest  proportion  of  the  proteome, 
and  is  the  composition  of  the  proteome  strongly  remodeled  over  the  diel  cycle? 
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Given  the  phototrophic  lifestyle  of  Prochlorococcus,  we  might  expect  that  the  cells  bear  a 
quite  distinct  complement  of  enzymes  in  the  light  and  dark  periods  of  the  day.  Somewhat 
surprisingly,  we  found  that  the  overall  composition  of  the  proteome,  as  gauged  by  the 
fractional  abundance  of  various  gene  products,  remained  generally  constant.  The  set  of 
highly  abundant  proteins  included  expected  enzymes  involved  in  photosynthesis,  C 
fixation,  nutrient  acquisition  and  cell  growth,  but  also  some  unexpected  oxidative-stress 
and  biosynthetic  systems.  There  did  not  appear  to  be  evidence  for  large-scale 
redistributions  of  cellular  resources  (at  least  in  terms  of  protein)  to  different  metabolic 
systems  over  the  course  of  the  day-night  cycle.  Since  many,  if  not  most,  physiological 
processes  in  Prochlorococcus  are  temporally  variable  to  some  extent,  the  stability  of  its 
proteome  composition  underlines  the  well-tuned  nature  of  cellular  metabolic  networks. 

Chapter  5  presents  a  model  of  Prochlorococcus  cellular  composition  based  on  this 
systems-level  view  of  metabolism  and  gene  product  abundances.  To  provide  initial 
bounds  on  problem,  we  asked,  how  are  the  major  elemental  complements  of 
Prochlorococcus  cells  apportioned  among  various  pools  of  hiochemicals?  Using  a 
combination  of  experimental  and  genomic  data,  we  calculated  carbon,  nitrogen  and 
phosphorus  distributions  among  the  major  biochemical  constituents  of  a  hypothetical 
Prochlorococcus  cell.  From  these  budgets  we  infer  that  almost  half  of  cellular  P  is  in  the 
chromosome,  that  cells  contain  only  a  few  hundred  ribosomes,  and  that  protein  copy 
numbers  probably  span  about  four  orders  of  magnitude. 

We  then  used  these  cellular  elemental  and  biochemical  budgets  to  inform  two  outstanding 
questions  regarding  the  ecology  and  evolution  of  Prochlorococcus.  First,  is  genome 
streamlining  in  Prochlorococcus  primarily  driven  by  adaptation  to  nutrient 
limitation?  The  small,  A-i-T-rich  genomes,  especially  of  high-light-adapted  strains,  have 
been  proposed  to  result  from  selection  for  lowered  nutrient  requirements  (Dufresne  et  ah, 
2005;  Partensky  and  Garczarek,  2010).  Instead,  we  found  that  the  effects  of  genome 
streamlining  on  cellular  nutrient  budgets  were  likely  marginal,  and  some  of  the  more 
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significant  nutrient  economies  actually  oppose  the  streamlining  trend.  It  seems  more 
plausible  that  a  combination  of  genetic-  and  population-level  processes  are  responsible 
for  many  of  the  large-scale  features  of  genome  evolution  in  marine  picocyanobacteria. 
Second,  how  might  stochasticity  in  gene  expression  affect  the  growth  of 
Prochlorococcus  cells?  As  small,  nutrient-poor  cells  with  low-integer  number  of  copies 
of  most  gene  products,  Prochlorococcus  could  be  susceptible  to  strong  stochastic  effects 
in  expressing  its  genes,  which  could  lead  to  metabolic  and  phenotypic  instability  (Raj  and 
van  Oudenaarden,  2008).  However,  the  slow  translation  rate  in  Prochlorococcus  would 
act  to  damp  expression  noise  at  the  protein  level,  and  we  develop  a  hypothesis  for  the 
potential  of  stochastic  effects  to  indirectly  limit  the  growth  of  small  cells. 
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Cores  recovered  during  the  Agouron  Griqualand  Drilling  Project  contain  over  2500  m  of  well-preserved 
late  Archean  Transvaal  Supergroup  sediments,  dating  from  ca.  2.67  to  2.46  Ga.  Bitumen  extracts  of  these 
strata  were  obtained  using  clean  drilling,  sampling  and  analysis  protocols  that  avoided  overprinting  syn- 
genetic  molecular  fossil  signatures  with  contaminant  hydrocarbons.  Comparisons  of  biomarker  contents 
in  stratigraphically  correlated  intervals  from  diverse  lithofacies  in  two  boreholes  separated  by  24  km,  as 
well  as  across  a  ~2  Gyr  unconformity,  provide  compelling  support  for  their  syngenetic  nature.  The  suite 
of  molecular  fossils  identified  in  the  late  Archean  bitumens  includes  hopanes  attributable  to  bacteria, 
potentially  including  cyanobacteria  and  methanotrophs,  and  steranes  of  eukaryotic  origin.  This  molecu¬ 
lar  fossil  record  supports  an  origin  in  the  Archean  Eon  of  the  three  Domains  of  cellular  life,  as  well  as  of 
oxygenic  photosynthesis  and  the  anabolic  use  of  O2. 

©  2009  Elsevier  B.V.  All  rights  reserved. 


1.  Introduction 

Widely  accepted  evidence  for  an  active  microbial  biosphere 
during  the  Archean  Eon  (3. 8-2. 5  Ga)  includes  physically  preserved 
objects,  such  as  microfossils  and  stromatolites,  and  a  range  of  chem¬ 
ical,  isotopic  and  geologic  signatures  of  biogeochemical  processes 
(Schopf  and  Walter,  1983;  Knoll,  2003;  Schopf,  2006).  Shales  bear¬ 
ing  abundant  organic  matter  attest  to  vigorous  primary  production 
in  marine  ecosystems  by  the  middle  Archean,  and  biotic  activity 
may  have  also  played  a  role  in  deposition  of  the  massive  iron  forma¬ 
tions  of  the  period  (Cloud  and  Licari,  1968;  Cloud,  1973).  Although 
there  is  little  doubt  that  life  had  established  itself  throughout  much 
of  the  oceans  no  later  than  ca.  3.4  Ga  (Allwood  et  al..  2006),  there  is 
scant  information  about  what  types  of  organisms  were  present  in 
Archean  marine  environments,  or  what  sorts  of  metabolic  processes 
they  relied  on. 

While  there  are  numerous  reports  of  microfossils  in  Archean 
sediments  (Schopf,  2006),  it  is  generally  agreed  that  morphology 
cannot  consistently  document  the  phylogenetic  affinities  or  phys¬ 
iological  capabilities  of  Archean  microbes.  Several  sets  of  criteria 
forjudging  the  biogenicity  of  microstructures  have  been  proposed 
(Schopf,  2006).  Archean  stromatolites  are  also  controversial  bio¬ 
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genic  remains  (Walter  et  al.,  1980;  Walter,  1983;  Grotzinger  and 
Rothman,  1996;  Hofmann  et  al.,  1999).  Their  occurrence  and  diverse 
morphologies  tend  to  be  associated  with  shallow  water  deposi- 
tional  settings  (Allwood  et  al.,  2006),  and  some  authors  attribute 
particular  deposits  to  microbes  capable  of  oxygenic  (Buick,  1992; 
Altermann  et  al.,  2006)  or  anoxygenic  (Bosak  et  al.,  2007)  photo¬ 
synthesis. 

Chemical  and  isotopic  traces  of  Archean  life  are  widespread 
and  may  sometimes  be  directly  associated  with  particular  micro¬ 
fossils  (House  et  al..  2000).  Sulfur  isotopic  data  provide  indirect 
evidence  for  the  early  evolution  of  sulfate  reduction  (Shen  et  al.. 
2001;  Shen  and  Buick.  2004).  Carbon  isotopes  of  total  organic 
carbon  in  Archean  rocks  provide  more  general  information  about 
biogeochemical  processes  such  as  carbon  assimilation,  methano- 
genesis.  methanotrophy  and  aerobiosis  (Hayes,  1983;  Eigenbrode 
and  Freeman,  2006).  This  sedimentary  organic  matter  is  the  direct 
geological  legacy  of  microbial  activity  and,  if  it  is  of  sufficiently  low 
thermal  grade,  there  is  potential  for  a  far  more  detailed  evaluation 
of  the  microbiota  present  at  the  time  of  deposition  using  particu¬ 
lar  kinds  of  hydrocarbons,  or  biomarkers,  preserved  therein  (Brocks 
and  Summons,  2003). 

Fossil  biomarkers  are  chemically  stable  molecules  that  derive 
from  the  carbon  skeletons  of  precursor  lipids.  Biomarkers  become 
incorporated  into  sediments,  either  freely  as  bitumen  or  bound 
into  macromolecular  organic  matter  (kerogen),  where  they  may  be 
preserved  for  billions  of  years  (Eglinton  et  al,  1964;  Brocks  et  al., 
1999, 2003b).  Where  these  compounds  occur  intact  and  uncontam- 
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inated,  they  represent  a  direct  avenue  for  ancient  organisms  to  leave 
identifiable  traces  of  themselves  in  the  fossil  record.  In  contrast  to 
bulk  chemical  and  isotopic  data,  which  only  carry  circumstantial 
evidence  of  the  metabolic  attributes  of  their  sources,  biomarkers 
can  carry  specific  information  about  the  identities  and  physiologies 
of  organisms  because  they  were,  in  their  original  state,  function¬ 
ing  components  of  living  cells.  In  their  preserved  state,  they  have 
chemical  structures  derived  from  the  original  biomolecules  through 
reasonably  well-known  pathways  of  diagenetic  alteration  (Peters 
et  al..  2005).  Most  paleobiologically  informative  biomarkers  are 
structurally  related  to  steroids,  triterpenoids  and  photosynthetic 
pigments  of  various  types  (Ourissonand  Albrecht,  1992;  Brocks  and 
Summons,  2003;  Volkman,  2005). 

For  a  biomarker  extracted  from  a  rock  to  be  considered  a  molecu¬ 
lar  fossil,  we  must  be  able  to  assess  its  syngeneity:  that  is,  to  discern 
whether  or  not  a  particular  molecule  derives  from  the  original  input 
of  organic  matter  to  a  sediment.  There  are  two  principal  routes  for 
non-syngenetic  biomarker  hydrocarbons  to  be  introduced  into  sed¬ 
imentary  rocks.  First,  under  certain  conditions,  hydrocarbons  can 
be  widely  mobile  in  sedimentary  basins,  so  bitumen  (operationally 
defined  as  the  solvent-extractable  portion  of  the  organic  matter)  in 
a  particular  rock  can  potentially  include  material  that  has  migrated 
between  hydraulically  connected  strata  of  very  different  ages.  This 
phenomenon  is  central  to  the  accumulation  of  massive  bitumen 
deposits  -  for  example,  oil  reservoirs  -  in  many  petroleum  systems. 
Second,  human  activity  has  suffused  much  of  the  surface  envi¬ 
ronment  with  petroleum-derived  hydrocarbons,  rendering  outcrop 
samples  of  bitumen-poor,  thermally  mature  Precambrian  rocks 
unsuitable  for  biomarker  analysis.  The  heavy  weathering  experi¬ 
enced  by  most  Archean  terrains  and  their  low  bitumen  contents 
mean  that  surface  exposures  are  generally  compromised.  Sampling 
the  subsurface  by  drilling  affords  long  stretches  of  pristine  stratig¬ 
raphy,  but  necessitates  contact  of  the  core  samples  with  drilling 
equipment  and  fluids.  The  trace  quantities  of  biomarker  molecules 
extractable  from  even  the  best-preserved  Archean  strata  mean 
that  attention  to  the  possibility  of  even  low-level  contamination  is 
essential  to  establishing  a  genuine  molecular  fossil  record  (Sherman 
etal.,  2007). 

To  date  there  have  been  few  detailed  studies  of  biomarkers  from 
Archean  deposits.  Two  in  recent  years  (Brocks,  2001 ;  Eigenbrode, 
2004)  focused  on  biomarker  analysis  of  resource-exploration  cores 
drilled  in  the  ca.  2.7-2.5Ga  Hamersley  Basin.  Western  Australia. 
In  both  cases  the  syngeneity  of  the  proposed  Archean  hydrocar¬ 
bons  was  carefully  assessed,  although  the  approaches  and  methods 
differed.  Both  authors  referred  to  the  geological  isolation  of  the 
basin,  the  structural  integrity  of  the  sediments  studied  and  the 
absence  of  younger  petroleum  source  rocks  from  the  Hamersley 
Basin  as  valid  reasons  for  discounting  contamination  from  hydro¬ 
carbons  migrated  long  distances  from  adjacent  petroleum-prone 
basins.  Brocks  (2001)  established  that  the  identified  Hamersley 
Basin  hydrocarbons  were  associated  with  kerogenous  shales,  that 
they  were  at  concentrations  significantly  above  procedural  blanks 
and  that  they  showed  maturity  patterns,  especially  in  respect  to 
aromatic  steroids,  adamantanes  and  polyaromatic  hydrocarbons, 
that  were  consistent  with  the  prehnite-pumpellyite  to  lower  green- 
schist  metamorphic  grade  of  the  host  rocks.  He  also  showed  that 
there  was  significant  stratigraphic  variation  in  biomarker  composi¬ 
tions,  that  the  biomarkers  showed  typical  Precambrian  patterns  and 
that  inappropriate  compounds  (e.g.  plant  terpanes)  were  not  evi¬ 
dent.  Brocks  et  al.  (2003a)  concluded  that  the  biomarkers  from  the 
Hamersley  and  Fortescue  groups  were  ‘probably  syngenetic  with 
their  Archean  host  rock’  although  they  could  not  absolutely  rule 
out  anthropogenic  hydrocarbon  contamination  introduced  during 
drilling,  transport  and  storage  of  the  cores. 

Eigenbrode  (2004)  examined  samples  representing  a  wider  suite 
of  lithologies  (and,  therefore,  paleoenvironments)  and  reached  sim¬ 


ilar  findings  for  those  wells  that  were  analyzed  in  common  with 
Brocks  (2001).  Eigenbrode  detected  significant  dispersion  in  the 
values  of  kerogen  that  was  interpreted  in  the  context  of 
different  paleoevironmental  settings  and  a  secular  trend  toward 
increasing  apparent  oxygenation  of  shallow  water  environments 
(Eigenbrode  and  Freeman,  2006).  Further,  it  was  shown  that  some 
biomarker  ratios  were  strongly  correlated  to  the  values  of 
associated  kerogens  or  to  dolomite  abundance,  supporting  a  syn¬ 
genetic  relationship  with  host  sediments  (Eigenbrode,  2004, 2008; 
Eigenbrode  et  al,  2008). 

Another  approach  to  molecular  analysis  of  Precambrian  organic 
matter  is  the  study  of  hydrocarbons  trapped  in  fluid  inclusions. 
Hydrocarbon-bearing  fluid  inclusions  in  Proterozoic  rocks  from 
Australia  (Dutkiewicz  et  al,  2003a,b,  2004;  Volk  et  al,  2003; 
George  et  al,  2008),  Gabon  (Dutkiewicz  et  al,  2007)  and  Canada 
(Dutkiewicz  et  al,  2006)  dating  as  far  back  as  >2.2  Ga  have 
yielded  suites  of  biomarker  compounds  including  steroids  and 
triterpenoids.  Fluid  inclusions  present  unique  bitumen  trapping 
conditions,  including  high  fluid  pressures  and  the  absence  of  clay 
minerals,  and  the  opportunity  to  assess  the  inclusion  entrapment  in 
the  context  of  the  alteration  history  of  the  host  rock.  Fluid  inclusion 
analysis  has  provided  insight  into  both  the  molecular  fossil  record 
of  Precambrian  life  and  the  chemical  behavior  of  biomarker  hydro¬ 
carbons  at  high  temperatures  and  pressures  (George  et  al,  2008). 

In  this  study,  we  examined  the  characteristics  of  organic  mat¬ 
ter  in  two  cores  (GKFOl  and  GKPOl )  drilled  as  part  of  the  Agouron 
Griqualand  Drilling  Project  with  the  express  purpose  of  obtaining 
fresh,  minimally  contaminated  late  Archean  sediments  for  sedi- 
mentological,  geochemical  and  paleontological  analyses  (Beukes  et 
al,  2004;  Schroder  et  al,  2006;  Sumner  and  Beukes,  2006).  Impor¬ 
tantly.  these  cores  represent  some  of  the  first  to  be  recovered  from 
late  Archean  strata  using  protocols  specifically  designed  to  mini¬ 
mize  potential  for  organic  contamination  throughout  the  drilling, 
handling  and  storage  process.  Given  the  extremely  low  quantities 
of  extractable  hydrocarbons  in  even  the  best-preserved  Archean 
sediments,  minimization  of  contamination  is  essential  to  avoid 
overprinting  the  indigenous  organic  signatures.  A  detailed  discus¬ 
sion  of  these  measures  and  the  biomarker  analysis  methods  used 
in  this  work  is  presented  elsewhere  (Sherman  et  al,  2007).  Here 
we  report  the  results  of  analysis  of  Archean  biomarkers  and  corre¬ 
lations  of  their  patterns  across  the  same  formations  in  two  cores 
drilled  24  km  apart. 

2.  Geological  context 

The  Transvaal  Supergroup  consists  of  a  mixed  siliciclastic- 
carbonate  ramp  that  grades  upward  into  an  extensive  carbonate 
platform  overlain  by  banded  iron  formation.  It  was  deposited  on 
the  Kaapvaal  Craton  between  2670  and  2460  Ma  (Armstrong  et  al, 
1991 :  Barton  et  al,  1994;  Walraven  and  Martini,  1995;  Sumner  and 
Bowring.  1996).  The  platform  is  up  to  2  km  thick,  with  predomi¬ 
nantly  peritidal  facies  in  the  north  and  east  and  mostly  deeper  facies 
to  the  south  and  west.  Platform,  slope  and  basinal  sediments  are 
preserved  between  Griquatown  and  Prieska  (Beukes,  1987;  Sumner 
and  Beukes,  2006). 

Two  scientific  cores,  Agouron  Institute  cores  GKPOl  and  GKFOl 
(hereafter  referred  to  as  GKP  and  GKF),  were  drilled  through  slope 
facies  to  provide  geochemically  fresh  samples.  The  two  cores  are 
correlated  to  each  other  with  14  tie  lines  using  volcanic  and  impact 
spherule  layers,  shale  geochemistry,  and  distinctive  facies  distribu¬ 
tions  (Schroder  et  al,  2006).  They  are  correlated  to  the  shallower 
platform  with  five  time  lines  using  impact  spherule  layers  and 
sequence  stratigraphy  (Sumner  et  al,  unpublished).  Water  depths 
represented  in  the  cores  range  from  wave  base  to  hundreds  of 
meters,  with  GKP  comprised  of  generally  deeper-water  facies  than 
GKF. 
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Depositional  facies  in  the  cores  are  well  preserved.  Most  of 
the  platform  experienced  sub-greenschist  grade  metamorphism  at 
most  (Button,  1973;  Miyano  and  Beukes,  1984).  However,  super¬ 
gene  alteration  during  late  fluid  flow  produced  local  Pb-Zn,  fluorite, 
and  gold  deposits  along  fluid  flow  paths  (Martini,  1976;  Clay,  1986; 
Duane  et  al.,  1991, 2004;  Tyler  and  Tyler,  1996}  and  caused  dolomiti- 
zation  along  the  platform  margin,  including  rocks  in  GKP  and  GKF. 
Fluid  flow  was  driven  by  the  ca.  2Ga  Kheis  orogeny  to  the  west 
(Beukes  and  Smit,  1987;  Duane  and  Kruger,  1991),  and  this  event 
probably  produced  the  peak  alteration  temperatures  experienced 
by  these  rocks  (Duane  et  al.,  2004). 

3.  Methods 

3.1.  Sample  selection  and  correlation 

Samples  were  chosen  to  both  span  the  full  stratigraphic 
time  represented  by  the  cores  and  to  represent  the  breadth  of 
lithofacies.  Sampling  also  focused  on  collecting  temporally  equiv¬ 
alent  samples  from  the  two  cores,  utilizing  detailed  inter-core 
stratigraphic  correlations  (Schroder  et  al.,  2006;  Sumner  et  al., 
unpublished).  This  sampling  approach  produced  pairs  of  sam¬ 
ples  that  can  be  used  to  compare  biomarker  preservation  and 
composition  in  temporally  equivalent  but  environmentally  dis¬ 
tinct  samples.  The  depths  of  the  samples  analyzed,  as  well  as 
their  formations,  rock  types  and  correlations,  are  indicated  in 
Table  1.  Detailed  images  of  the  full  lengths  of  the  cores  and 
stratigraphic  columns  are  available  as  part  of  the  Agouron- 
Griqualand  Paleoproterozoic  Drilling  Project  Online  Database  at 
http://general.uJ.ac.za/agouron/index.aspx.  After  recovery,  cores 
were  stored  in  aluminum  trays  and  samples  selected  for  biomarker 
analysis  were  wrapped  in  two  layers  of  precombusted  aluminum 
foil. 

3.2.  Materials 

Solvents  (hexane,  dichloromethane,  and  methanol)  used  in  sam¬ 
ple  extraction  and  equipment  cleaning  were  high  purity  grade 
(OmniSolv,  EMD  Chemicals).  De-ionized  (Dl)  water  and  hydrochlo¬ 
ric  acid  (HCl)  used  to  process  samples  were  extracted  five  times 
with  dichloromethane  (DCM).  Glassware,  aluminum  foil,  silica  gel, 
and  glass  wool  were  combusted  at  550  °C  for  8h  and  quartz  sand 
(Accusand,  Unimin  Corp.)  was  combusted  at  850°C  for  12 h  prior 
to  use.  Metal  tools  used  to  process  samples  were  cleaned  with  Dl 
water  and  then  rinsed  five  times  with  methanol,  DCM,  and  hexane. 
Crushing  tools  (described  below)  were  cleaned  of  particulates  with 
combusted  quartz  sand  and  then  ultrasonicated  for  30  min  each  in 
methanol  and  DCM. 

3.3.  Sample  preparation 

Quarter  core  samples  (mostly  14  NQ  47.6  mm  diameter)  were 
approximately  20-50  cm  in  length  and  50-200  g  in  weight.  They 
were  processed  in  batches  of  six  with  at  least  one  procedural  sand 
blank  per  batch.  Through  a  series  of  experiments  (Sherman  et 
al.,  2007),  we  found  that  foreign,  less  mature  organic  matter  was 
present  on  the  outsides  of  the  cores  and  that  this  component  could 
not  be  removed  simply  by  ultrasonication  in  organic  solvents.  As  a 
result,  we  found  it  necessary  to  remove  the  outer  3-5  mm  of  each 
exposed  surface  of  the  core  pieces.  Samples  were  cut  using  a  sec¬ 
tioning  saw  with  a  water  lubricated  diamond-edged  blade  (UKAM). 
DCM-extracted  water  was  used  to  lubricate  the  saw  blade.  Between 
samples,  the  saw  was  washed  with  Dl  water  and  the  blade  was 
washed  and  ultrasonicated  in  methanol  and  DCM.  After  cutting, 
the  inner  core  pieces  were  ultrasonicated  in  DCM-cleaned  water. 


methanol,  and  DCM  to  further  clean  their  outer  surfaces.  The  sam¬ 
ples  were  then  crushed  to  <5  mm  pieces  using  a  stainless  steel 
mortar  and  pestle.  These  pieces  were  then  ground  to  a  fine  powder 
(sub-140  mesh)  in  a  SPEX  8510  Shatterbox  using  a  stainless  steel 
puck  mill  that  was  modeled  after  a  SPEX  8507  mill  (Sherman  et  al., 
2007). 

3.4.  Extraction  and  fractionation 

Each  powdered  sample  (40-90  g)  was  divided  into  60  ml  vials 
(~15g  per  vial)  and  25-30  ml  of  DCM  was  added  to  each  vial. 
The  vials  were  then  ultrasonicated  for  30  min  and  the  extraction 
solvent  decanted  after  allowing  the  rock  powder  to  settle.  This 
process  was  repeated  twice  and  the  solvent  from  the  three  extrac¬ 
tions  was  pooled.  These  total  bitumen  extracts  were  filtered  in 
wide-bore  columns  over  ~3cm  of  silica  gel  and  then  treated  with 
acid-activated  copper  to  remove  elemental  sulfur.  The  extracts  were 
separated  into  saturated,  aromatic,  and  polar  fractions  by  liquid  col¬ 
umn  chromatography  in  glass  pipette  columns  packed  with  ~8  cm 
of  silica  gel.  Saturated  hydrocarbons  were  eluted  with  hexane  (3/8 
column  volume),  aromatic  hydrocarbons  with  hexaneiDCM  (4:1 
(v/v);  4  column  volumes),  and  polars  with  DCM:methanol  (7:3 
(v/v);  4  column  volumes). 

3.5.  Preparation  of  bitumen  II 

After  the  samples  were  extracted  as  described  above,  a  sub¬ 
set  of  powders  were  demineralized  by  acidification.  About  30g  of 
each  sample  was  placed  into  aqua  regia-  and  solvent-cleaned  Teflon 
tubes  (~12g  per  tube).  Acids  were  then  added  to  the  samples  and 
allowed  to  react  as  follows:  6N  DCM-extracted  HCl  for  24-48  h 
(to  remove  carbonates),  48%  HF  for  at  least  72  h  (to  dissolve  sili¬ 
cates),  and  6N  DCM-extracted  HCl  again  for  24  h  (to  remove  fluoride 
precipitates).  The  samples  were  then  washed  several  times  in  DCM- 
extracted  water.  The  resulting  powders  were  re-extracted  following 
the  procedures  described  above  and  were  analyzed  as  “bitumen  II.” 

3.6.  GC-MS  and  GC-MS-MS  (MRM)  analyses 

The  saturated  and  aromatic  hydrocarbon  fractions  were  then 
gently  dried  under  a  stream  of  nitrogen  to  a  volume  of  roughly 
80|xl.  10  ng  of  D4  (d4-C29-aaa-ethIycholestane;  Chiron  Laborato¬ 
ries,  Inc.)  and  1  fjig  of  aiC22  (3-methylheneicosane,  99+%  purity; 
ULTRA  Scientific)  was  added  to  the  saturated  fraction  and  413  ng  of 
Di4  (di4-para-terphenyl,  98  atom%  deuterium:  Cambridge  Isotope 
Laboratories,  Inc.)  was  added  to  the  aromatic  fraction  as  internal 
standards. 

The  saturated  and  aromatic  hydrocarbon  fractions  were  ana¬ 
lyzed  by  gas  chromatography-mass  spectrometry  (GC-MS)  in  full 
scan  mode  and  by  selected  ion  monitoring  respectively.  Biomarker 
analyses  of  the  saturated  fraction  were  conducted  by  metastable 
reaction  monitoring  GC-MS  (GC-MS-MS  or  MRM).  Each  of  these 
analyses  was  conducted  on  a  Micromass  AutoSpec  Ultima  equipped 
with  an  Agilent  6890N  gas  chromatograph.  The  GC  was  fitted  with 
a  DB-1  fused  silica  capillary  column  (60  m  x  0.25  mm  i.d.,  0.25  jim 
film  thickness;  JSjW  Scientific)  and  used  He  as  the  carrier  gas.  Dur¬ 
ing  each  analysis,  the  GC  ramped  from  60  to  150  "^C  at  10°C/min, 
then  at  3  °C/min  to  315  °C,  which  was  held  for  24  min.  The  AutoSpec 
source  was  operated  in  El-mode  at  250  '  C,  70  eV  ionization  energy, 
and  8  kV  accelerating  voltage.  Full  scan  analyses  were  conducted 
over  a  mass  range  of  50-600  Da  at  a  rate  of  0.8  s/decade  with  a  0.2  s 
inter-scan  delay.  Data  were  acquired  and  processed  using  MassLynx 
4.0  (Micromass  Ltd.).  Compounds  were  quantified  based  on  manual 
peak  integration  and  comparison  to  the  internal  standards  (using 
m/z  =  85  for  full  scan  analyses). 
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3.7.  Bulk  composition  by  LO! 

Bulk  weight  percent  organic  matter,  weight  percent  carbon¬ 
ate.  and  weight  percent  ash  of  each  sample  was  estimated  using 
loss  on  ignition  (LOI)  techniques.  300-400  mg  of  rock  powder  was 
weighed  into  a  small  ceramic  crucible  and  combusted  in  a  Barn- 
stead/Thermolyne  30400  furnace  at  550  “C  for  4h.  Once  cool,  the 
powders  were  weighed,  and  mass  loss  at  this  step  taken  as  the 
organic  matter  content.  The  powders  were  then  re-combusted  at 
950®C  for  2h,  and  the  further  mass  loss  taken  as  weight  percent 
carbonate.  Material  remaining  after  the  final  combustion  (primarily 
silicate,  oxide  and  sulfide  minerals)  was  considered  ash. 

3.S.  EA-IRMS  analyses 

Rock  powders  for  kerogen  isotope  measurements  were  first 
demineralized  using  6N  HCl  and  48%  HF  as  described  above.  After 
this  treatment.  IN  HNO3  was  added  to  the  samples  to  dissolve 
sulfide  minerals.  The  largely  demineralized  powders  were  then 
dried  and  weighed  in  triplicate  into  tin  cups  (0.5-0.8  mg  each).  Car¬ 
bon  isotopic  compositions  were  determined  using  a  Fisons  (Carlo 
Erba)  NA 1500  elemental  analyzer  fitted  with  a  Costech  Zero  Blank 
Autosampler  and  coupled  to  a  Thermo  Finnigan  Delta  Plus  XP  iso¬ 
tope  ratio  mass  spectrometer. 

4.  Composition  of  Archean  organic  matter 

The  overall  composition  of  the  organic  matter  present  in  the 
Griqualand  cores  is  shown  schematically  in  Fig.  1.  The  Griqualand 
rocks  preserve  abundant  organic  matter  -  exceeding  10  wt%  in  some 
samples  (Table  1 )  -  but  very  little  of  that  organic  matter  is  present  as 
extractable  bitumen.  In  all  cases,  in  excess  of  99.99%  of  the  organic 
matter  is  in  the  form  of  insoluble,  macromolecular  kerogen  (Fig.  1 ) 
closely  associated  with  the  mineral  matrix.  These  characteristics  are 


a  result  of  the  extensive  diagenetic  and  thermal  histories  of  the  host 
rocks.  The  carbon  isotope  composition  of  the  kerogen  isolated  from 
the  samples  analyzed  for  biomarkers  closely  follows  the  organic 
carbon  isotope  stratigraphy  for  the  cores  determined  by  Fischer  et 
al.  (this  volume)  (Fig.  2E). 

4.7.  Composition  of  extractable  bitumens 

While  solvent-extractable  bitumen  comprises  a  tiny  fraction  of 
the  organic  matter  in  these  late  Archean  rocks,  it  is  the  portion 
most  amenable  to  detailed  molecular  analyses  that  might  shed 
paleobiological  and  paleoenvironmental  light  on  the  organisms 
and  ecosystems  that  produced  the  preserved  organic  material.  The 
bulk  of  the  results  presented  herein  are  from  detailed  character¬ 
izations  of  the  molecular  compositions  of  bitumen  I  (extracts  of 
whole-rock  powders)  from  cores  GKF  and  GKP,  and  bitumen  11 
(extracts  of  demineralized  kerogen)  from  a  subset  of  samples  from 
core  GKF.  Figs.  2,  5,  6  and  8  show  selected  indices  of  the  molec¬ 
ular  composition  of  these  three  bitumens  through  the  two  cores; 
details  of  these  downcore  variations  are  presented  in  the  follow¬ 
ing  sections.  Samples  correlated  between  cores  GKF  and  GKP  on 
sedimentological  and  stratigraphic  grounds  (i.e.,  chosen  a  priori 
before  organic  analysis)  are  plotted  in  those  figures  at  equivalent 
depths  on  a  representative  stratigraphy,  indicated  at  the  left  of 
each  figure.  Results  from  analysis  of  three  samples  of  the  Permo- 
Carboniferous  Dwyka  formation  diamictite,  which  unconformably 
overlies  the  late  Archean  strata  in  both  cores,  are  also  plotted. 
Analysis  of  the  Dwyka  diamictite,  while  clearly  not  informative  as 
to  late  Archean  biogeochemistry,  proved  very  useful  as  a  test  of 
the  integrity  and  syngeneity  of  molecular  fossils  in  the  underlying 
strata;  these  results  are  discussed  below  in  Section  5.5. 

Overall,  molecular  indices  of  bitumen  composition  did  not 
significantly  correlate  with  quantitative  measures  of  host  rock 
composition  including  organic  carbon,  carbonate,  iron  and  sulfur 
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Fig.  1.  Schematic  of  organic  matter  composition  in  late  Archean  core  samples.  Areas  of  subdivisions  within  boxes  are  proportional  to  the  abundance  of  different  pools  of 
constituents. 
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Fig.  2.  (A-E)  Selected  organic  matter  composition  parameters  for  samples  from  cores  GKF  and  GKP.  Sample  depths  are  indicated  at  left  on  a  representative  stratigraphy  and 
samples  correlated  between  the  two  cores  are  plotted  at  equivalent  vertical  positions.  GKF  samples  indicated  in  bold  were  analyzed  for  bitumen  11.  Errors  in  composition 
parameters  (2cr)  are  plotted;  where  bars  do  not  appear,  they  are  smaller  than  the  symbol  size.  Uncertainties  in  correlations  were  generally  smaller  than  symbol  size  in 
the  vertical  dimension.  Compound  abbreviations:  DMN  -  dimethyinaphthalene,  MP  -  methylphenanthrene.  (E)  Carbon  isotope  composition  of  kerogen  isolated  from  core 
samples.  Lines  are  the  5’^C  values  for  total  organic  carbon  determined  by  Fischer  et  al.  (this  volume).  Formation  abbreviations  at  left:  D  -  Dwyka,  K  -  Kuruman,  KN  -  Klein 
Naute,  N  -  Nauga,  R  -  Reivilo,  M  -  Monteville,  L  -  Lokamonna,  B  -  Boomplaas,  V  -  Vryburg. 


contents  and  values.  While  these  quantities  are  an  incom¬ 
plete  description  of  the  complex  rock  matrix,  such  correlations 
have  been  observed  in  some  other  studies  of  Precambrian  strata 
(Eigenbrode,  2008;  Eigenbrode  et  al.,  2008)  and  interpreted  as  a 
paleoenvironmental  signal  of  coupling  of  organic  matter  source 
and  depositional  environment.  The  absence  of  these  correlations 
in  the  Transvaal  dataset  may  reflect  a  weaker  coupling  of  the 
source  of  the  organic  matter  (microbial  activity)  and  the  mode 
of  its  incorporation  into  the  sediments  with  the  inputs  of  clas¬ 
tic  and  inorganic  chemical  components.  As  elemental  and  isotopic 
compositions  of  the  Transvaal  strata  show  significant  variation  on 
centimeter  to  meter  scales  (Ono  et  al.,  this  volume;  Sumner  et  al., 
unpublished),  this  coupling  would  have  to  have  been  quite  strong 
for  bulk-molecular  correlations  to  hold  over  hundreds  of  meters  of 
stratigraphy.  The  correspondence  in  bitumen  composition  between 
samples  correlated  on  stratigraphic  grounds  (Section  5.3),  how¬ 
ever,  suggests  that  longer-term  trends  in  organic  matter  sourcing 
and/or  depositional  environment  are  reflected  in  the  cores.  It  is  also 


possible  that  more  detailed  characterization  of  the  rock  mineral 
and  organic  matrix  compositions,  including  quantification  of  spe¬ 
cific  mineral  phases  and  examination  of  microscale  organic-mineral 
associations,  will  further  reveal  depositional  controls  of  bitumen 
composition. 

4.17.  Saturated  and  aromatic  hydrocarbons 

Bitumens  from  the  Agouron  cores  are  composed  almost  exclu¬ 
sively  of  saturated  and  aromatic  hydrocarbons;  more  polar  oxygen- 
and  nitrogen-containing  functionalized  compounds  (e.g.,  car¬ 
boxylic  acids,  porphyrins,  phenols,  etc.)  were  below  detection 
limits.  Partially  unsaturated  compounds  (e.g.,  alkenes)  were  also 
not  detected.  Most  bitumens  contained  more  aromatic  than  satu¬ 
rated  hydrocarbons,  consistent  with  the  highly  aromatic  character 
of  the  kerogen  (Table  1 ).  All  of  these  characteristics  are  expected  of 
highly  thermally  mature  bitumen-kerogen  associations  where  dia- 
genetic  and  catagenetic  reactions  have  proceeded  for  long  periods 
and  where  the  disproportionation  of  the  original  organic  matter 
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into  disordered,  aromatised  carbon  and  light  hydrocarbons  is  near¬ 
ing  completion. 

Saturated  hydrocarbon  fractions  extracted  from  these  sed¬ 
iments  are  dominated  by  straight-chain  n-alkanes,  with  a 
condensate-like  carbon-number  distribution  peaking  between  Cn 


and  C20.  The  acyclic  isoprenoids  pristane  and  phytane,  generally 
considered  to  be  the  products  of  diagenesis  of  the  side-chain  of 
chlorophyll,  were  detected  in  all  samples.  The  pristane/phytane 
ratios  of  the  late  Archean  bitumens  are  low  (0.4-1. 5;  Fig.  2A), 
which  is  consistent  with  (though  not  necessarily  indicative  of) 
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Fig.  3.  (A)  Typical  results  of  MRM  GC-MS-MS  analysis  for  hopane  biomarkers.  C27  through  Csi  and  A-ring  methylated  Cai  hopanes  are  shown;  homohopanes  to  C35  were 
routinely  detected  as  well,  in  lower  abundances.  MRM  transitions  are  indicated  at  right,  and  all  chromatograms  are  shown  to  the  same  absolute  scaling.  Beneath  each  sample 
chromatogram  is  the  equivalent  trace  from  analysis  of  the  parallel  procedural  blank.  Compound  abbreviations;  Ts  -  18a-22,29.30-trisnorneohopane,  Tm  -  17a-22,29,30- 
trisnorhopane,  DNH  -  dinorhopane,  ap  -  17a(H),21  p(H)  hopane.  C29  Ts  -  18a-30-norneohopane,  pa  -  17p(H),21a(H)  hopane  (moretane),  30-nor  -  C30  30-norhomohopane, 
2a-Me-  2a-methylhopane,  3p-Me  -  3P-methylhopane,  7  -gammacerane.(B)  Typical  results  from  sterane  biomarker  analysis,  showing  C26  through  C30  regular  steranes  and 
diasteranes.  Regular  sterane  structures  shown  at  left;  diasteranes  have  methyl  rearrangements  to  C5  and  C14.  Compound  abbreviations;  aaa  -  5a(H),14a(H),17a(H)  sterane, 
app  -  5a(H),14p(H),17P(H)  sterane. 
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Table  2 

Sample-to-blank  (S/B)  ratios  for  biomarker  compounds,  shown  as  averages  across  analyses  of  bitumen  I  extracts  from  Transvaal  Supergroup  samples  (n  =  23).  Where  a  peak 
corresponding  a  given  isomer  was  not  identified  in  the  analysis  of  a  procedural  blank,  a  baseline  noise  peak  of  equivalent  retention  time  was  integrated  to  reflect  the  detection 
limit  of  the  mass  spectrometer  during  that  analytical  session. 


deposition  in  a  saline,  anoxic  environment.  The  aromatic  frac¬ 
tions  (as  analyzed  by  SIM-GC-MS)  are  composed  primarily  of 
low-molecular  weight  polycyclic  aromatic  hydrocarbons  (PAH; 
naphthalene  and  phenanthrene)  and  their  allcylated  homologues. 
Larger  PAHs,  including  fluoranthene,  pyrene  and  (in  some  sam¬ 
ples)  perylene  were  detected  at  much  lower  abundances.  The 
substitution  patterns  of  dimethylnaphthalenes  and  methylphenan- 
threnes  showed  dominance  of  the  thermodynamically  favored 
p-isomers  over  the  more  sterically  hindered  a-isomers  (Fig.  2C  and 
D). 

The  preponderance  of  low-molecular  weight  compounds  in  the 
Transvaal  Supergroup  bitumens  also  means  that  that  they  are  very 
susceptible  to  evaporation  during  extraction  and  analysis.  This 
applies  particularly  to  n-alkanes  <Ci4  and  naphthalenes  and  may 
have  affected  the  apparent  bitumen  yields,  which  can  be  considered 
lower  limits  for  in  situ  concentrations.  With  the  possible  excep¬ 
tion  of  chlorophyll-derived  pristane  and  phytane,  little  can  be  said 
about  the  specific  biological  sources  of  these  simpler  hydrocarbons 
or  their  precursors. 

4.2.  Biomarkers 

Though  they  represent  only  a  small  proportion  of  the  bitu¬ 
men  extracts  (Fig.  1),  the  cyclic  terpenoids  -  hopanes,  steranes 
and  cheilanthanes  -  are  the  most  information-rich  molecules  with 
regard  to  interpretations  of  the  paleobiological  source  of  the  organic 
matter  and  its  diagenetic  history.  These  molecules  are  unam¬ 
biguously  biogenic:  the  hopanoids  and  steroids,  in  particular,  are 
well-characterized  as  the  biosynthetic  products  of  the  enzymatic 
cyclization  of  squalene  and  oxidosqualene,  respectively  (Rohmer 
et  al.,  1984;  Ourisson  et  al.,  1987;  Summons  et  al..  2006;  Fischer 
and  Pearson,  2007),  which  enables  a  more  informed  interpretation 
of  these  compounds  as  molecular  fossils.  Typical  results  of  MRM 
GC-MS-MS  biomarker  analysis  of  a  saturated  hydrocarbon  frac¬ 


tion  from  a  bitumen  extract  is  shown  in  Fig.  3.  These  classes  of 
cyclic  terpenoids  were  present  at  ppb  to  sub-ppb  levels  (Table  1 ), 
with  concentrations  of  individual  compounds  at  the  parts  per  tril¬ 
lion  level  by  weight.  Despite  these  extremely  low  yields,  a  broad 
diversity  of  biomarker  structures  could  be  consistently  detected  in 
bitumen  extracts  over  the  full  depth  of  Archean  strata  intersected 
by  both  cores.  Analytical  sample-to-blank  ratios  (i.e.,  amount  of  a 
given  compound  detected  in  a  core  sample  compared  to  that  in 
the  parallel  procedural  blank)  for  biomarker  compounds  are  listed 
in  Table  2.  For  many  compounds,  the  ‘blank’  reflects  the  detection 
limit  of  the  mass  spectrometer  (baseline  noise  above  which  a  peak 
must  rise)  rather  than  the  identified  presence  of  a  compound  in  the 
procedural  sand  blank.  The  relative  contributions  of  hopanes.  ster¬ 
anes  and  cheilanthanes  to  the  total  biomarker  content  are  shown  in 
Fig.  4.  The  three  types  are  present  in  roughly  equal  abundance,  with 
a  slight  preference  for  steranes  that  may  or  may  not  be  significant 
with  regard  to  the  source  and/or  diagenesis  of  the  organic  matter 
(see  Section  6.2). 

4.2.1.  Hopanes 

Several  series  of  hopanes  with  27-35  carbon  atoms  were 
detected  in  all  bitumens  (Fig.  3A).  These  include:  17a(H),21(3(H)- 
hopanes;  2a-  and  3|3-methylhopanes:  17|3(H),21a(H)-hopanes 
(moretanes):  and  several  rearranged  and  norhopanes  (Fig.  3A). 
Average  sample-to-blank  ratios  generally  exceeded  10  for  the 
principal  C27-C31  isomers  (Table  2).  The  stratigraphic  distribu¬ 
tions  of  selected  hopane  isomers  are  shown  in  Fig.  5.  The  isomer 
distributions  of  the  hopanes  show  predominance  of  the  more  ther¬ 
modynamically  stable  forms,  consistent  with  the  high  thermal 
maturity  of  the  Transvaal  host  rocks.  In  particular,  all  the  late 
Archean  bitumens  show  high  Ts/Tm  ratios  (>0.5;  Fig.  5A)  and  low 
moretane/hopane  ratios  -  approaching  the  thermodynamic  end¬ 
point  of  0.05  -  in  the  C29  and  C30  homologues  (Fig.  5C  and  E).  These 
indices  of  high  maturity  are  consistent  between  bitumens  I  and  II 
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Fig.  4.  Ternary  diagram  showing  the  composition  of  the  cyclic  terpenoid  biomarker  fraction  of  the  bitumen  extracts.  Sterane,  hopane  and  cheilanthane  contents  are  calculated 
as  the  sum  of  all  detected  pseudohomologues  and  isomers. 
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(Fig.  5A,  C  and  E),  suggesting  that  the  two  bitumen  pools  have  the 
same  thermal  history. 

C28  dinorhopanes,  including  28,30-  and  29,30-dinor  pseu- 
dohomologues,  are  present  to  varying  degrees  in  all  the  late 
Archean  bitumens  (Fig.  3A).  28,30-Dinorhopane  contents  also  var¬ 
ied  between  the  cores,  being  higher  in  GKP  than  GKF  below 
the  uppermost  Nauga  formation  (Fig.  5B);  this  difference  is  dis¬ 
cussed  further  in  Section  5.3.  The  30-norhopane  series,  commonly 
observed  in  Phanerozoic  oils  and  sediments,  is  also  present  in  the 
Transvaal  Supergroup  bitumens.  Fig.  3A  identifies  the  elution  posi¬ 
tion  of  the  peak  identified  as  30-norhomohopane  while  Fig.  5  shows 
GKF  and  GKP  down-core  variations  in  the  abundances  of  the  C29 
(column  D)  and  C30  (column  F)  members  of  the  series  relative  to  C30 
a(3-hopane.  This  series  of  hopanoids  has  been  identified  as  partic¬ 
ularly  abundant  in  sediments  with  carbonate  lithologies  (Subroto 
et  al.,  1991, 1992)  including  those  of  Precambrian  age  (Brocks  and 
Summons,  2003)  and  therefore  it  is  not  surprising  to  find  it  present 
in  GKF  and  GKP.  The  C2gH/C3oH  ratio  varies  between  0.8  and  1.2  for 
the  Transvaal  Supergroup  sediments  (Fig.  5D)  which  is  the  range 
commonly  found  in  Phanerozoic  oils  from  carbonate  source  rocks 
but  above  the  range  (0.3-0.7)  observed  for  oils  derived  from  shales 
(Knoll  et  al.,  2007).  However,  within  the  error  limits,  no  defini¬ 
tive  relationship  to  carbonate  content  could  be  discerned  in  the 
C29H/C30H  ratios  of  GKP  and  GKF. 

In  all  of  these  indices  of  hopane  content,  the  Permian  Dwyka  For¬ 
mation  in  the  uppermost  portion  of  the  cores  shows  a  distinct  and 
less  mature  composition.  The  relevance  of  this  contrast  in  assess¬ 
ment  of  the  syngeneity  of  the  biomarkers  from  the  Agouron  cores 
is  discussed  in  detail  in  Section  5.5. 

A-ring  methylated  hopanes  were  also  detected  in  the  Transvaal 
Supergroup  bitumens  (Fig.  3A).  2a-Methylhopanes  with  31-35 
carbon  atoms  were  found,  while  3(3-methyIhopanes  were  less 
abundant  and  could  usually  be  detected  only  at  C31.  3- 
Methylhopanoids,  precursors  of  3(3-methylhopane,  are  known 
principally  from  aerobic  methanotrophic  bacteria,  though  they 
have  been  found  in  Acetobacter  sp.  as  well  (Zundel  and  Rohmer, 
1985a,  1985b).  While  their  biological  source  could  in  principle  be 
elucidated  through  isotopic  analysis,  the  extremely  small  quanti¬ 
ties  extractable  from  the  core  samples  preclude  this  at  present.  The 
presence  of  3(3-methylhopanes  in  the  Transvaal  bitumens  is  con¬ 
sistent  with  proposals  for  active  methane  cycling  in  late  Archean 
marine  ecosystems  (Hayes,  1983;  Eigenbrode  and  Freeman,  2006). 
Strong  stratigraphic  patterns  are  not  apparent  (Fig.  6B),  though  the 
portion  of  the  Transvaal  Supergroup  sampled  by  the  Agouron  cores 
does  not  include  intervals  with  strongly  ^^C-depleted  organic  car¬ 
bon  (few  points  are  below  -40%o:  Fig.  2E)  that  would  indicate  a  large 
contribution  of  recycled  methane  to  sedimentary  organic  matter. 

High  relative  abundances  of  2a-methylhopanes  (C3i2- 
MeH/C3oH  >0.05)  are  typical  of  Precambrian  bitumens  (Knoll 
et  al.,  2007),  and  most  of  the  Transvaal  samples  are  in  this  range 
(Fig.  6A).  Summons  et  al.  (1999)  suggested  that  2ct-methylhopanes 
are  markers  for  oxygenic  cyanobacteria.  This  interpretation  is  cer¬ 
tainly  consistent  with  the  presence  of  steranes,  whose  precursor 
steroids  require  O2  for  their  biosynthesis  (see  Section  6.2).  Rashby 
et  al.  (2007),  however,  have  recently  reported  the  biosynthesis  of 
2-methyIhopanoids  by  the  a-proteobacterium  Rhodopseudomonas 
palustris,  including  during  photosynthetic  growth  under  anaerobic 
conditions.  Whatever  its  identity,  the  biological  source  of  the 
2-methyIhopanoids  to  the  Transvaal  sediments  must  have  been 
ecologically  widespread  and  persistent  in  the  environment,  as 
these  biomarkers  are  present  throughout  the  >200  Myr  period 
recorded  in  the  Agouron  cores.  The  higher  abundance  of  2a- 
methylhopane  in  core  GKF  relative  to  GKP  (Fig.  6A)  -  particularly 
through  the  deposition  of  the  Reivilo  and  Nauga  formations  where 
facies  in  GKF  represent  significantly  shallower  depositional  envi¬ 
ronments  than  those  in  GKP  -  suggests  that  the  2-methylhopanoid 


producing  organisms  may  have  been  most  abundant  in  shallower- 
water,  platform  environments.  Notably,  these  are  the  same  kinds  of 
environments  where  2-methylhopanoid  producing  cyanobacteria 
have  left  especially  strong  biomarker  (and  microfossil)  signatures 
through  the  Phanerozoic  (Summons  et  al.,  1999;  Knoll  et  al.,  2007). 

4.2.2.  Cheilanthanes  and  other  terpanes 

Tricyclic  biomarkers  of  the  cheilanthane  type  were  found  in  all 
the  Archean  bitumens,  typically  at  abundances  similar  to  those  of 
hopanes  and  steranes  (Fig.  4).  13|3(H),14a(H)  homologues  from  C19 
to  C26  were  detected,  with  the  C23  being  consistently  the  most  abun¬ 
dant  member  of  the  series  (Fig.  6C).  As  expected  for  the  regular 
isoprenoid  side-chain,  C22  was  the  least  abundant  homologue.  C24 
tetracyclic  terpane  was  also  detected  in  many  samples.  As  the  bio¬ 
logical  source(s)  of  cheilanthanes  and  C24  tetracyclic  terpane  are  not 
known  (Brocks  et  al.,  2003b),  little  paleobiological  interpretation 
can  be  made  of  their  presence.  To  the  extent  that  they  are  syn- 
genetic,  however,  the  tricyclic  biomarkers  in  the  Griqualand  cores 
confirm  that  the  biosynthetic  pathways  leading  to  the  cheilanthane 
and  C24  tetracyclic  terpane  precursors  were  operative  by  the  late 
Archean  (Brocks,  2001 ;  Eigenbrode,  2004). 

Gammacerane  was  also  detected  at  low  abundance  in  many  of 
the  Transvaal  bitumens  (Fig.  6D).  The  biological  precursor  of  gam¬ 
macerane  is  tetrahymanol,  a  lipid  with  both  bacterial  (Kleemann  et 
al.,  1990;  Bravo  et  al.,  2001 )  and  eukaryotic  (Harvey  and  McManus, 
1991 ;  Sinninghe  Damste  et  al.,  1995)  sources.  Some  bacteriovorous 
ciliates  produce  tetrahymanol  as  a  steroid  substitute  when  feeding 
below  the  oxic-anoxic  transition  in  stratified  waters,  so  gam¬ 
macerane  is  commonly  seen  as  a  marker  of  water-column  redox 
stratification.  While  such  stratification  is  certainly  a  plausible  sce¬ 
nario  for  an  Archean  marine  environment,  the  biological  source  of 
tetrahymanol  was  not  necessarily  predatory  protozoa.  The  presence 
of  tetrahymanol  in  a  number  of  proteobacteria,  notably  anoxygenic 
phototrophs  (Kleemann  et  al.,  1990;  Rashby  et  al,  2007),  makes  a 
prokaryotic  source  more  likely.  Gammacerane,  unlike  the  C31  2a- 
methylhopane,  does  not  show  higher  abundance  in  GKF  relative  to 
GKP  (compare  Fig.  6A  and  D),  suggesting  some  biological  and/or 
geographical  separation  of  the  sources  of  those  two  biomarkers. 

4.2.3.  Steranes 

Series  of  steranes  with  26-30  carbon  atoms  were  also  detected 
in  all  bitumens  analyzed  (Fig.  3B).  The  predominant  isomers  of  all 
homologues  were  5a(H),14a(H),17a(H)  and  5a(H),14[3(H),17[3(H) 
regular  steranes  (both  20S  and  20R  epimers)  and  rearranged 
13|3(H),17a(H)-diasteranes.  C27  to  C29  homologues  are  dominant, 
with  C26  and  C30  steranes  comprising  only  1. 9-5.4%  and  1.5-6. 2% 
of  total  steranes,  respectively.  Average  sample-to-blank  ratios  for 
compounds  of  the  C27  to  C29  series  range  from  10.6  to  45.8  (Table  2), 
and  certain  isomers  in  individual  samples  were  present  at  up  to  248 
times  their  abundance  in  the  parallel  procedural  blank.  Among  the 
C27-C29  steranes,  the  three  carbon  numbers  show  similar  abun¬ 
dances,  with  a  slight  preference  for  C27  homologues  (Fig.  7).  The 
carbon-number  distribution  of  steranes  is  very  consistent  between 
cores  GKF  and  GKP  and  between  bitumens  I  and  11  in  GKF.  The 
sterane  carbon-number  distributions  in  the  late  Archean  bitumens 
fall  within  the  range  of  compositions  of  Phanerozoic  petroleum 
(gray  line  in  Fig.  7).  suggesting  that  the  sources  of  steroids  to  the 
late  Archean  sediment  were  not  radically  different  from  that  of 
later  periods  and  involved  multiple  protistan  taxa.  Identified  C26 
steranes  included  the  21 -nor  and  27-norcholestanes.  C30  steranes 
included  24-n-propyl  regular  and  diasteranes.  Steranes  bearing 
additional  methylation  in  the  ring  system  (including  4-methyl  and 
4,4-dimethyl  steranes)  were  not  detected  in  any  of  the  Griqualand 
core  samples,  despite  use  of  sensitive  MRM  techniques  to  search  for 
them  specifically.  While  such  methylsteranes  cannot  truly  be  said 
to  be  absent  from  the  rocks  -  only  to  be  below  the  detection  limit  of 
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Fig.  6.  (A)  and  (B)  Ratios  of  A-ring  methylated  hopanes  to  desmethyl  homologues.  (C)  and  (D)  Selected  terpane  ratios  of  cheilanthanes  and  gammacerane.  Down-core  plots 
as  in  Fig.  2. 


the  methods  employed  -  it  is  clear  that  the  precursors  of  desmethyl 
steranes  were  a  much  larger  proportion  of  the  organic  matter  input 
to  these  sediments  than  were  methylsteroids. 

Fig.  8  shows  the  stratigraphic  variation  in  selected  sterane 
isomer  distributions  down  the  two  cores.  As  with  the  hopanes, 
the  steranes  show  isomer  distributions  indicative  of  high  ther¬ 
mal  maturity.  Specifically,  the  sterane  isomers  are  distributed 
towards  thermally  favored  configurations  around  C14  and  C17  (aaa 
to  app:  Fig.  8D)  and  C20  (i?  to  S;  Fig.  8E).  Diasteranes,  which 
arise  from  sterene  intermediates  that  have  undergone  an  acid- 
or  clay-catalyzed  backbone  rearrangement  during  early  diagenesis 
(Rubinstein  et  al.,  1975),  are  abundant  in  the  late  Archean  bitumens: 
diasterane/regular  sterane  ratios  range  from  0.4  up  to  2.3.  Notably, 
the  down-core  variation  in  diasterane/regular  sterane  ratios  track 
exceptionally  closely  among  all  the  pseudohomologues  in  bitumen 
I  extracts  from  both  cores  and  in  bitumen  II  (Fig.  8A-C).  This  cor¬ 
respondence  among  the  C27  to  C29  steranes  in  their  degrees  of 


rearrangement  is  highly  suggestive  of  a  single  source  and  a  com¬ 
mon  diagenetic  history  for  the  steranes  in  the  Transvaal  Supergroup 
bitumens.  A  detailed  discussion  of  the  paleobiological  interpreta¬ 
tion  of  the  steranes  from  the  Agouron  cores  is  presented  in  Section 
6.2. 

5.  Syngeneity  of  molecular  fossils 

The  determination  of  the  source  of  biomarker  molecules  is  of 
paramount  importance  in  their  interpretation.  Without  confidence 
that  biomarkers  extracted  from  a  sedimentary  rock  sample  actually 
derive  from  organic  matter  that  was  a  constituent  of  the  origi¬ 
nal  sediment,  the  antiquity  of  these  molecules  and  their  utility 
as  molecular  fossils  is  in  question.  The  Agouron  Griqualand  Basin 
cores  have  provided  a  unique  opportunity  to  test  the  syngeneity 
of  Archean  molecular  fossils.  Since  the  cores  were  drilled  with¬ 
out  hydrocarbon  drilling  fluids,  were  handled  and  curated  with 
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39 


Fig.  7.  Ternary  composition  diagram  showing  sterane  carbon-number  distributions  in  bitumen  extracts.  The  gray  line  indicates  the  range  of  compositions  observed  in 
Phanerozoic  petroleum  systems,  from  the  GeoMark  Reservoir  Fluid  Database  (http://www.geomarkresearch.com). 


exceptional  care  and  were  analyzed  shortly  after  recovery,  the 
potential  for  contamination  of  the  core  material  was  minimized. 
The  very  small  quantities  of  extractable  organic  matter  remain¬ 
ing  in  these  rocks  after  their  long  history  makes  it  very  easy  to 
overprint  syngenetic  biomarker  contents  with  exogenous  hydrocar¬ 
bons,  so  attention  to  clean  drilling,  handling,  storage  and  analysis 
is  essential.  The  low  procedural  blank  achieved  using  the  methods 
described  above  (Fig.  3;  Table  2)  demonstrates  that  laboratory  con¬ 
tamination  during  analysis  was  also  minimal.  Nevertheless,  even 
when  laboratory  blanks  are  well  below  sample  hydrocarbon  con¬ 
tents,  the  data  resulting  from  analysis  of  bitumen  extracts  must 
still  be  scrutinized  for  evidence  of  contamination  or  characteris¬ 
tics  incompatible  with  syngenetic  organic  matter.  We  did  not,  for 
instance,  assume  that  cutting  off  the  outer  portions  of  the  core  sam¬ 
ples  necessarily  removed  contaminant  hydrocarbons  (contra  Brocks 
et  al.,  2008).  We  present  here  five  lines  of  evidence  indicating  that 
the  molecular  fossils  reported  here  from  the  Agouron  cores  are  syn¬ 
genetic  with  their  host  rocks  and  are  late  Archean  in  age.  It  should 
be  noted  that  no  single  one  of  these  criteria  constitutes  a  sufficient 
condition  for  syngeneity,  and  we  do  not  take  satisfaction  of  one  cri¬ 
terion  (e.g.,  thermal  maturity:  cf.  Brocks  et  al.,  2008)  as  prima  facie 
evidence  for  indigenous  molecular  fossils.  Rather,  it  is  the  simul¬ 
taneous  occurrence  of  all  these  characteristics  that  demonstrates 
syngeneity. 

5.1.  Thermal  maturity 

All  the  bitumens  extracted  from  late  Archean  strata  in  the  Gri- 
qualand  cores  have  molecular  compositions  indicative  of  very  high 
thermal  maturity.  Syngenetic  organic  matter  must  have  the  same 
burial  and  thermal  history  as  its  host  rock,  so  high  thermal  matu¬ 
rity  is  to  be  expected  for  bitumens  from  host  rocks  that  have 
experienced  prehnite-pumpellyite  facies  metamorphism,  indicat¬ 
ing  extended  periods  at  temperatures  >200  °C.  In  the  part  of  the 
Griqualand  West  Basin  sampled  by  cores  GKF  and  GKP,  meta- 
morphic  alteration  temperatures  likely  peaked  ca.  2  Ga  (Duane  et 
al.,  2004).  Overall,  the  molecular  composition  of  the  late  Archean 
bitumens  is  consistent  with  thermal  maturities  in  the  wet-gas 


zone,  as  found  in  the  Hamersley  Basin  (Brocks,  2001;  Brocks  et 
al.,  2003a:  Eigenbrode,  2004).  The  condensate-like  distributions 
of  n-alkanes,  strong  preference  for  (3-substitution  of  polycyclic 
aromatics  and  overall  dominance  of  low-molecular-weight  hydro¬ 
carbons  are  indicative  of  high  degrees  of  thermal  cracking  and 
maturation.  The  trace  quantities  of  biomarkers  detected  in  the  bitu¬ 
mens  all  show  thermodynamically  controlled  isomerization  and 
stereochemical  rearrangement  patterns  (see  Section  4.2),  consis¬ 
tent  with  high  thermal  maturity.  As  noted  by  Brocks  et  al.  (2003a), 
it  is  difficult  to  use  these  molecular  parameters  to  accurately  gauge 
relative  thermal  maturity  in  rocks  of  this  age,  since  so  few  organic 
geochemical  studies  of  Archean  and  Paleoproterozoic  rocks  have 
been  performed  and  little  is  known  about  the  physicochemical 
behavior  of  these  hypermature  bitumens  over  their  exceptionally 
long  histories. 

While  quantitative  assessments  of  the  thermal  maturity  of 
Archean  bitumens  remain  challenging,  it  is  clear  that  the  hydro¬ 
carbon  biomarkers  detected  in  the  Griqualand  core  samples  reflect 
the  time-temperature  histories  of  their  host  rocks.  Notably,  their 
high  thermal  maturity  constrains  contributions  to  the  bitumens 
to  before  the  time  of  peak  heating  of  the  host  strata,  ca.  2  Ga.  The 
biomarkers  in  these  bitumens  are  molecular  fossils,  not  function¬ 
alized  biochemicals,  and  certainly  do  not  derive  from  modern,  or 
even  geologically  recent,  microbial  activity  in  the  host  rocks. 

5.2.  Absence  of  petrochemical  and  Phanerozoic  signatures 

Petrochemicals  -  anthropogenic  products  of  refined  petroleum 
-  are  ubiquitous  in  industrialized  settings,  including  drill  sites  and 
research  laboratories.  As  petroleum-derived  hydrocarbons  are  a 
principal  source  of  potential  contamination,  detection  of  synthetic 
petrochemical  compounds  in  bitumen  extracts  would  sharply  com¬ 
promise  their  interpretation  as  molecular  fossils.  Grosjean  and 
Logan  (2007)  and  Brocks  et  al.  (2008)  have  highlighted  the  utility 
of  branched  alkanes  with  quaternary  carbons  (BAQCs)  as  tracers 
of  polyethylene  contamination  in  drill  cores.  Using  selected  ion 
and  difference  chromatograms,  the  Agouron  core  extracts  were 
examined  for  several  series  of  BAQCs,  including  2,2-  through  9,9- 
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Fig.  8.  Selected  sterane  biomarker  parameters  for  samples  from  cores  GKF  and  GKP,  plotted  down-core  as  in  Fig.  2.  Compound  abbreviations  as  in  Fig.  3B. 


dimethyl,  -diethyl  and  mixed  alkyl  (e.g.,  ethyl-,  methyl-;  butyl-, 
ethyl-)  alkanes.  No  signals  indicative  of  the  presence  of  BAQCs  were 
detected.  The  absence  of  this  one  class  of  contaminants  does  not 
exclude  the  possibility  of  other  forms  of  contamination,  but  lends 
some  confidence  that  core  storage  and  handling  conditions  did  not 
introduce  high  levels  of  exogenous  petrochemicals  into  the  sam¬ 
ples. 

Syngenetic  organic  matter  from  Precambrian  rocks  should  also 
be  devoid  of  biomarkers  known  only  from  later-evolving  organisms. 
The  largest  class  of  these  characteristically  Phanerozoic  biomarkers 
are  the  wide  variety  of  polyterpenoid  compounds  synthesized  by 
vascular  plants.  No  fossils  of  plant  terpenoids  could  be  identified 
in  the  late  Archean  bitumens,  despite  specific  investigation  when 
potential  signals  of  their  presence  were  detected.  In  one  instance, 
this  investigation  led  to  detection  of  an  additional  series  of  biomark¬ 
ers. 

MRM-GC-MS-MS  analysis  of  a  number  of  core  extracts  showed 
a  small,  partly  resolved  doublet  peak  in  the  m/z  412  >  191  reaction 
chromatogram  for  a  compound  eluting  just  before  C30  hopane  at  the 
time  expected  for  oleanane,  a  diagnostic  marker  for  angiosperms 
(indicated  with  an  asterisk  in  Fig.  9A).  Oleanane  is  abundant  in 
the  sedimentary  record  only  from  the  late  Cretaceous  onwards 


( Moldowan  et  al.,  1994)  so  its  presence  in  the  Transvaal  Supergroup 
bitumens  would  be  clear  evidence  of  contamination  by  younger 
hydrocarbons.  Identification  based  solely  on  the  m/z  412  >  191  tran¬ 
sition  is  insufficient,  however,  since  oleanane  is  known  to  co-elute 
with  other  triterpenoids  including  members  of  the  25-norhopane 
series,  as  shown  by  Dzou  et  al.  (1999).  25-norhopanes  are  generally 
considered  to  be  the  products  of  microbial  degradation  of  hopanes 
(Volkman  etal.,  1983;  Peters  etal.,  1996),  and,  owing  to  the  removal 
of  the  CIO  methyl  group,  give  a  strong  m/z  177  fragment  ion. 

We  thus  re-examined  the  samples  with  the  potential  oleanane 
peak  by  measuring  a  series  of  M^>177  transitions,  and  compared 
the  results  with  analysis  of  a  biodegraded  crude  oil  from  the 
Llanos  Basin  of  Colombia  that  contains  abundant  25-norhopanes 
(Fig.  9A).  Inspection  of  the  trace  for  the  412  >  177  transition  revealed 
the  presence  of  two  peaks  at  the  retention  times  of  the  C30 
25-norhomohopanes  (22S  +  R).  Note  the  inversion  of  the  relative 
contributions  of  the  22S  and  22R  epimers  between  the  412  >191 
and  412  >177  traces,  observed  in  both  the  oil  and  core  samples. 
Examination  of  other  MRM  transitions  indicated  the  presence  of 
C29  25-norhopane  and  C28  25,30-dinorhopane  as  well  (Fig.  9B) 
confirming  the  presence  of  a  pseudohomologous  series  of  these 
biomarkers.  In  fact,  the  series  of  25-norhopanes  appeared  to  be 
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Fig.  9.  (A)  MRM  GC-MS-MS  traces  from  analyses  of  an  Archean  core  sample  (lower)  and  a  biodegraded  oil  sample  from  the  Cretaceous-Miocene  Llanos  Basin,  Colombia  (upper). 
The  peak  indicated  with  a  star,  observed  in  a  number  of  the  late  Archean  bitumens,  prompted  further  investigation  (see  Section  5.2).  (B)  MRM  GC-MS-MS  chromatograms 
of  M*  >  177  transitions  showing  the  presence  of  a  25-norhopane  series. 


present  at  a  low  level  in  all  samples  analyzed.  We  conclude  that 
the  small  peak  preceding  the  C30  ap  hopane  in  the  412  >  191  trace 
arises  largely  from  the  presence  of  25-nor-ap-homohopane-221?, 
rather  than  from  oleanane.  While  the  observation  of  the  >  177 
series  makes  a  25-norhomohopane  the  favored  candidate  for  the 
source  of  at  least  part  of  this  signal,  other  terpenoids  (including  ring 
E-methylated  28-norhopanes  and  the  C  ring-enlarged  C31 -22S  14a- 
homo-26-nor-17a(H)-hopane)  are  known  to  elute  in  this  region 
under  certain  chomatographic  conditions  (Nytoft  et  al.,  2002).  At 
present,  there  is  insufficient  extract  from  our  samples  to  conduct 


analyses  that  would  definitively  identify  the  compound  or  com¬ 
pounds  constituting  the  peak  in  question.  Besides  this  peak,  no 
other  potentially  plant-derived  signals  have  been  identified  in  the 
late  Archean  bitumens. 

The  presence  of  25-norhopanes  superficially  suggests  that  the 
Transvaal  bitumens  were  subjected  to  some  degree  of  biodegra¬ 
dation  earlier  in  their  geological  history.  Alternatively,  the  series 
of  25-norhopanes  could  reflect  an  original  microbial  source  (Blanc 
and  Coniian,  1992;  Chosson  et  al.,  1992)  with  the  series  becoming 
enriched  relative  to  the  more  common  hopane  series  over  time. 
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In  the  present  case,  the  origins  of  the  25-norhopanes  cannot  be 
ascribed:  they  may  derive  from  original  microbial  input  or  from 
earlier  in  situ  biodegradation  or,  possibly,  they  are  somewhat  more 
thermodynamically  stable  and  resistant  to  cracking. 

Beyond  the  absence  of  diagnostic  markers  of  later-evolving 
organisms,  the  biomarker  composition  of  the  late  Archean  bitu¬ 
mens  would  be  very  unusual  if  detected  in  a  Phanerozoic  setting. 
For  example,  the  Transvaal  Supergroup  bitumens  have  relatively 
large  amounts  of  C28  steranes  (30-40%  of  total  C27-29 ;  Fig.  7),  which 
are  typically  <25%  of  the  total  in  Paleozoic  oils  (Grantham  and 
Wakefield,  1988;  Knoll  et  al.,  2007).  Mesozoic  and  younger  bitu¬ 
mens  contain  more  C28  steranes,  owing  to  the  rise  to  ecological 
prominence  of  the  chlorophyll  c-containing  phytoplankton  (Knoll 
et  al.,  2007),  but  also  usually  contain  plant  markers  (especially 
oleanane  in  Cretaceous  and  younger  systems)  and/or  abundant 
dinosteranes,  none  of  which  were  detected  here.  The  high  rela¬ 
tive  abundance  2-methylhopane,  exceeding  10%  of  C30  hopane  in 
a  number  samples  of  various  lithologies  (Fig.  6A),  is  also  a  signa¬ 
ture  more  typical  of  Precambrian  deposits  than  Phanerozoic  ones 
(Knoll  et  al.,  2007).  Taken  together,  the  constellation  of  biomarkers 
seen  in  the  Transvaal  Supergroup  bitumens  is  inconsistent  with  a 
source  of  Phanerozoic  age.  It  is  thus  difficult  -  merely  on  composi¬ 
tional  grounds  -  to  construct  a  scenario  by  which  the  bitumen  in  the 
Transvaal  strata  could  have  migrated  from  a  much  younger  source. 

5.3.  Correlation  between  cores  CKF  and  CKP 

An  important  aspect  of  the  Agouron  Griqualand  drilling  project, 
unique  among  Precambian  Earth-history  drilling  efforts,  is  the 
recovery  of  long  intervals  of  correlated  stratigraphy  from  geo¬ 
graphically  distant  parts  of  a  single  depositional  system.  Cores 
GKF  and  GKP,  drilled  24  km  apart,  are  well  correlated  over  much 
of  their  lengths,  representing  ca.  200  million  years  of  deposition 
of  the  Transvaal  Supergroup.  The  basis  for  these  correlations  and 
detailed  discussion  of  the  depositional  environment  are  presented 
elsewhere  (Schroder  et  al.,  2006;  Sumner  et  al.,  unpublished). 
These  correlated  sections  provide  a  singular  opportunity  to  look 
for  chemostratigraphic  patterns  in  biomarker  contents.  The  late 
Archean  bitumens,  while  of  a  uniformly  high  maturity,  do  exhibit 
enough  compositional  heterogeneity  that  variations  along  the 
stratigraphy  are  potentially  informative. 

When  the  molecular  composition  of  bitumens  from  GKF  and 
GKP  are  compared  down-core,  the  stratigraphic  correspondence 
in  biomarker  variation  is  striking.  The  values  of  many  biomarker 
parameters  from  samples  correlated  on  sedimentological  grounds 
plot  nearly  on  top  of  one  another,  even  as  they  vary  between 
strata  by  amounts  significantly  greater  than  the  analytical  preci¬ 
sion  (Figs.  2, 5  and  8).  The  correspondence  is  particularly  notable  in 
the  upper  sections  of  the  core  over  the  Kuruman,  Klein  Naute,  and 
uppermost  Nauga  formations,  where  (3/a  dimethylnaphthalene 
(Fig.  2C),  Ts/Tm  (Fig.  5A),  28,30-dinorhopane/C3o  hopane  (Fig.  5B), 
C30  30-norhopane/hopane  (Fig.  5F)  and  each  of  C27-C29  diaster- 
ane/regular  sterane  (Fig.  8A-C)  ratios  all  track  exceptionally  closely 
over  more  than  100  m  of  stratigraphy.  The  two  cores  sample  sim¬ 
ilar  deepwater  depositional  environments  through  this  interval, 
and  so  likely  had  similar  organic  matter  inputs,  and  the  litholo¬ 
gies  of  the  host  rocks  are  similar,  creating  similar  physicochemical 
environments  for  diagenesis.  While  one  expects  similar  biomarker 
signatures  from  similar  depositional  settings  with  similar  organic- 
matter  sources,  finding  such  correspondence  preserved  over  2.5 
billion  years  in  trace  quantities  of  molecular  fossils  is  remarkable. 

In  the  lower  portion  of  the  cores,  from  the  Nauga  Formation 
down  to  the  lowermost  correlated  samples  in  the  Monteville, 
there  is  more  inter-core  variation  and  the  bitumen  compositions 
of  the  two  cores  diverge  in  certain  respects.  The  lower  parts  of 
the  cores  represent  less  similar  depositional  settings,  with  GKF 


comprising  relatively  more  marginal  and  GKP  more  basinal  facies. 
However,  nowhere  do  we  detect  gross  compositional  differences 
that  suggest  radically  different  sources  of  organic  matter  to  the 
sites  of  the  two  cores,  and  a  number  of  biomarker  ratios  (e.g.. 
Fig.  5C-E.  Fig.  8E)  do  correlate,  albeit  within  a  small  range  of  varia¬ 
tion  comparable  to  the  analytical  precision.  Some  parameters  (e.g.. 
Figs.  2C  and  D.  5F,  and  8D  and  F)  show  apparently  uncorrelated 
variation  around  a  constant  mean.  A  few  consistent  differences, 
however,  can  be  discerned  between  the  lower  sections  of  GKF  and 
GKP,  and  these  may  be  related  to  their  relative  positions  on  the 
platform  margin. 

Below  the  upper  Nauga,  GKP  has  lower  diasterane/regular 
sterane  ratios  (Fig.  8A-C),  higher  relative  amounts  of  28,30- 
dinorhopane  (Fig.  5B)  and  lower  Ts/Tm  ratios  (Fig.  5A)  than  GKF.  C29 
Ts/hopane  ratios,  analogous  to  Ts/Tm,  show  similar  behavior  (data 
not  shown).  This  is  in  spite  of  similar  patterns  of  other  hopanes  (C29 
and  C30  moretane/hopane;  Fig.  5C  and  E)  and  steranes  (a(3(3/aaa 
and  20SI20R;  Fig.  8D  and  E)  in  a  majority  of  samples.  Several  com¬ 
positional  differences  are  also  apparent:  GKP  has  relatively  more 
steranes  (at  the  expense  of  cheilanthanes;  Fig.  4),  higher  relative 
C28  sterane  content  (Fig.  7)  and  less  2a-methylhopane  (Fig.  6A) 
than  GKF  and  less  variable  ratios  of  some  cheilanthanes  such  as 
C23/C20  (Fig-  6C).  These  differences,  while  not  large,  suggest  that 
there  may  have  been  a  cross-platform  gradient  in  microbial  com¬ 
munity  composition  and.  more  tentatively,  that  sedimentary  redox 
conditions  at  the  location  of  GKP  may  have  been  more  reducing,  up 
until  partway  through  the  deposition  of  the  Nauga  Formation. 

The  consistency  down-core  of  the  differences  in  C27-C29  diaster- 
ane  and  28,30-DNH  (Fig.  5B)  contents  between  GKF  and  GKP  are 
particularly  striking  and,  perhaps,  record  a  subtle  sedimentary  or 
environmental  distinction  that  decreased  over  the  time  of  plat¬ 
form  deposition.  The  biomarker  28,30-DNH  is  typically  abundant 
in  highly  reducing  environments  (Grantham  et  al.,  1980;  Peters  et 
al.,  2005).  However,  in  the  case  of  immature  Miocene  California  Bor¬ 
derland  sediments,  where  the  whole  sedimentary  section  records 
a  reducing  paleoenvironment,  the  abundance  of  28,30-DNH  rela¬ 
tive  to  the  total  hopane  content  is  inversely  correlated  with  the 
content  of  clay  minerals  (Brincat  and  Abbott,  2001)  whose  pres¬ 
ence  should  promote  the  formation  of  diasteranes.  In  accord  with 
this  pattern,  below  the  uppermost  Nauga  Formation  the  relative 
contents  of  diasteranes  and  Ts  in  GKF  increase  compared  to  the 
correlated  intervals  in  GKP  while  28,30-DNH  is  relatively  more 
abundant  in  GKP. 

The  observed  chemostratigraphic  relationships  between  corre¬ 
lated  samples  in  GKF  and  GKP  provide  compelling  evidence  for 
syngeneity.  The  bed-to-bed  correlation  is  especially  close  in  the 
deepwater  facies  in  the  upper  ~350  m  of  each  core,  and  the  lower 
portions  show  compositional  differences  that  may  reflect  their 
relative  positions  on  the  Campbellrand  Platform  margin.  These 
relationships  can  only  arise  when  the  organic  contents  of  rocks 
are  depositionally  controlled  and  not  subsequently  overprinted  by 
hydrocarbon  migration  or  anthropogenic  contamination. 

5.4.  Composition  of  bitumens  I  and  II 

To  further  test  the  syngeneity  of  the  Archean  bitumens,  seven 
samples  from  core  GKF  (which  had  already  been  solvent  extracted 
to  prepare  bitumen  1)  were  treated  with  acids  (HCl  and  HF)  to 
dissolve  and  disaggregate  much  of  the  mineral  matter  and  were 
then  re-extracted  to  yield  a  fraction  termed  bitumen  II  (see  Sec¬ 
tion  3.5).  In  contrast  to  bitumen  I  (the  bitumen  extractable  from 
whole-rock  powder),  bitumen  II  represents  material  that  was  not 
solvent-accessible  in  situ  or  even  after  mechanical  crushing.  While 
its  precise  microscale  distribution  is  unclear,  bitumen  II  is  certainly 
more  closely  associated  with  the  kerogen  and  mineral  matrices 
of  the  rock  than  is  bitumen  I.  Additionally,  since  contaminants 
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(anthropogenic  or  migrated)  are  necessarily  mobile,  they  are  more 
likely  to  be  found  in  the  more  solvent-accessible  bitumen  1.  Hence, 
where  bitumen  1  and  II  extracts  from  the  same  rock  show  sim¬ 
ilar  compositions,  they  are  likely  derived  from  syn-depositional 
organic  matter,  with  the  caveat  that  bitumen  II  may  be  enriched 
in  compounds  produced  by  mineral-surface-catalyzed  diagenetic 
reactions  and  derived  from  maturing  kerogen. 

The  compositions  of  bitumens  I  and  II  from  the  GKF  core  sam¬ 
ples  appear  similar  in  many  respects.  Yields  of  the  two  bitumens 
are  in  the  same  tens-to-hundreds  of  ppb  range  (Table  1);  there  is 
not  a  very  much  larger  quantity  of  solvent-extractable  bitumen  lib¬ 
erated  by  acid  demineralization.  The  gross  molecular  composition 
of  the  two  bitumens  is  the  same,  dominated  by  low-molecular- 
weight  saturated  and  aromatic  hydrocarbons  (Fig.  1).  Bitumen  II 
contains  cyclic  terpenoid  biomarkers  at  slightly  higher  concentra¬ 
tions  than  bitumen  1  (Table  1 ).  The  carbon-number  distributions  of 
steranes  (Fig.  7),  hopanes  (Fig.  5D)  and  cheilanthanes  (Fig.  6C)  fully 
overlap  between  bitumens  1  and  II,  and  moretanes  and  norhopanes 
show  very  similar  relative  abundances  (Fig.  5B-D  and  F),  consistent 
with  the  biomarkers  in  the  two  pools  of  organic  matter  deriv¬ 
ing  from  the  same  source.  Crucially,  the  thermal  maturities  of  the 
two  bitumens  inferred  from  the  biomarker  isomer  distributions 
are  equivalent— an  essential  and  characteristic  quality  of  syngenetic 
organic  matter.  Evidence  for  equivalent  thermal  maturities  includes 
similar  values  in  bitumen  1  and  II  extracts  for:  hopanoid  Ts/Tm  ratios 
(Fig.  5A)  and  extent  of  pa-to-ap  isomerization  around  C17  and  C21 
for  C29  and  C30  homologues  (moretane/hopane;  Fig.  5C  and  E),  as 
well  as  ratios  of  C27-29  to  acta  steranes  (Fig.  8D)  and  epimer- 
izaton  around  C20  (Fig.  8E).  The  down-core  variability  for  several  of 
these  indices  tracks  closely  between  bitumens  1  and  II  as  well;  this 
is  good  evidence  that  organic  matter  source  and  depositional  envi¬ 
ronment,  rather  than  later  overprinting  by  contaminants,  controls 
differences  in  bitumen  composition  between  strata.  Taken  together, 
the  consistent  similarities  in  molecular  composition,  thermal  matu¬ 
rity  and  patterns  of  stratigraphic  variation  between  bitumens  I  and 
II  are  strong  evidence  for  syngeneity. 

Furthermore,  the  compositional  differences  between  the  two 
bitumen  extracts  are  in  fact  of  the  kind  one  might  expect  in  com¬ 
paring  pools  of  organic  matter  that  have  the  same  source,  but  are 
more-  versus  less-closely  matrix-associated  in  the  same  rock.  One 
example  is  the  higher  diasterane/regular  sterane  ratios  in  bitumen  II 
than  bitumen  I,  especially  in  the  upper  ~1000  m  of  core  GKF,  above 
the  Monteville  formation  (Fig.  8A-C).  Diasteranes  are  thought  to 
be  generated  from  sterenes  by  concerted  methyl-rearrangement 
reactions  that  are  catalyzed  by  acidic  sites  on  clay  minerals  such 
as  montmorillonite  and  illite  (Rubinstein  et  al.,  1975;  Sieskind  et 
al.,  1979;  van  Kaam-Peters  et  al,  1998).  Hence,  the  more  closely 
mineral-associated  bitumen  II,  having  greater  access  to  clays  in  the 
rock  matrix,  would  plausibly  be  enriched  in  diasteranes  relative  to 
bitumen  I.  The  two  samples  from  the  Monteville  formation,  how¬ 
ever.  show  nearly  identical  diasterane/regular  sterane  ratios.  This 
may  be  related  to  a  distinct  clay  mineralogy  or  style  of  organic 
sedimentation  in  the  Monteville;  in  any  event,  it  points  toward 
depositional  control  of  organic  matter  composition  and  diagenesis. 
As  noted  above  in  comparing  between  GKF  and  GKP,  these  down- 
core  patterns  are  consistent  in  bitumens  I  and  II  between  C27,  C28 
and  C29  (dia)steranes.  Both  similarities  and  contrasts  in  molecular 
composition  between  bitumens  1  and  II  indicate  the  hydrocarbons’ 
derivation  from  syn-depositional  organic  matter. 

5.5.  Contrasting  composition  of  overlying  Dwyka  diamictite 

One  unintended  -  but  unexpectedly  useful  -  aspect  of  the 
Griqualand  drilling  was  the  intersection  (in  both  the  GKF  and 
GKP  boreholes)  of  the  Dwyka  Formation,  a  Permo-Carboniferous 
diamictite  that  was  deposited  over  a  large  portion  of  southern 


Africa  (Visser,  1989).  This  Phanerozoic  glacial  deposit  overlies  the 
late  Archean  Transvaal  sedimentary  sequence  above  a  2-billion- 
year  unconformity  with  the  Kuruman  Formation.  While  not  the 
target  of  the  Agouron  drilling  project,  the  Dwyka  was  intersected  in 
the  two  cores  to  depths  approaching  200  m,  well  below  the  mod¬ 
ern  weathering  horizon.  The  recovery  of  unweathered  Phanerozoic 
sedimentary  rock  in  the  same  borehole  provides  another  oppor¬ 
tunity  to  test  the  syngeneity  of  the  Archean  bitumens,  since  the 
organic  matter  in  the  Dwyka  has  very  different  sources  and  thermal 
history  from  the  Transvaal  sediments,  yet  the  core  samples  were 
all  exposed  to  the  same  drilling  equipment,  handling  and  storage 
conditions,  and  chemical  analyses.  Pervasive  contamination  during 
drilling  and  analysis  would  obliterate  the  contrast  that  should  be 
apparent  across  the  Dwyka/Kuruman  unconformity.  On  the  other 
hand,  if  the  very  different  character  of  the  Dwyka  organic  matter 
were  observed  in  analyses  of  the  Griqualand  cores,  this  would  be 
strong  evidence  that  anthropogenic  contamination  did  not  over¬ 
print  syngenetic  organic  signatures  in  the  core  samples. 

By  nearly  every  metric  of  molecular  composition,  the  Dwyka 
samples  from  the  Griqualand  cores  are,  in  fact,  distinct  from  the  late 
Archean  rocks  directly  below.  Both  saturate  and  aromatic  hydrocar¬ 
bon  fractions  show  differences,  including  higher  pristane/phytane 
and  lower  phytane/n-Cig  ratios  (Fig.  2A  and  B),  strong  even-over- 
odd  alkane  preferences  in  the  GKF  Dwyka  samples,  and  much  higher 
contents  of  thermodynamically  disfavored  a-substituted  dimethyl- 
naphthalenes  and  methylphenanthrenes  (Fig.  2C  and  D).  Bitumen 
from  the  Dwyka  also  contains  less  thermally  stable  molecules,  such 
as  anthracene  (Smith  et  al.,  1995),  that  are  not  detected  in  the 
Transvaal. 

The  contrast  is  especially  clear  in  cyclic  biomarker  contents, 
which  uniformly  indicate  a  significant  difference  in  source  and  ther¬ 
mal  history  across  the  unconformity.  The  biomarkers  in  two  Dwyka 
samples  are  strongly  hopane-dominated,  while  the  third  is  less  so 
but  still  outside  the  range  of  the  Archean  samples  (Fig.  4).  Organic 
matter  from  the  Dwyka  appears  rather  heterogeneous,  which  is 
expected  in  a  clastic  glaciogenic  deposit.  Consistent  with  this  depo¬ 
sitional  setting,  the  studied  core  samples  contained  numerous 
large,  irregularly  shaped  clasts  with  dissimilar  lithologies,  indi¬ 
cating  multiple  sedimentary  sources.  The  sterane  carbon-number 
distributions  of  the  Archean  samples  define  a  small  field  in  Fig.  7, 
while  those  of  the  Dwyka  vary  much  more  widely,  with  the  GKP 
sample  plotting  just  on  the  edge  of  the  Archean  field  and  the  two 
GKF  samples  completely  distinct  from  it.  Their  relatively  low  con¬ 
tent  of  C28  steranes  is  also  a  typically  Paleozoic  signature.  Even 
more  striking  are  the  sharp  contrasts  in  biomarker  maturity  param¬ 
eters  across  as  little  as  15  m  of  stratigraphy,  shown  in  Figs.  5  and  8. 
By  every  measure,  the  organic  matter  in  the  Dwyka  is  far  less 
thermally  mature  than  the  Archean  sedimentary  rocks  just  a  few 
meters  below.  Evidence  includes:  much  lower  Ts/Tm  and  higher 
pa/ap  hopane  ratios  (Fig.  5A,  C  and  D),  lower  diasterane/regular 
sterane  and  lower  sterane  a(3p/aaa  ratios  (Fig.  8A-D),  and  the 
detection  in  the  Dwyka  of  thermodynamically  less-stable  isomers, 
such  as  17(8-hopanes,  which  are  absent  from  the  Transvaal  rocks. 
Carbon-number  distributions  of  hopanes  (Fig.  5D)  and  cheilan¬ 
thanes  (Fig.  6C)  are  also  distinct. 

Clearly,  the  sharp  compositional  and  maturity  contrasts 
between  the  Dwyka  and  the  underlying  late  Archean  rocks  are 
preserved  in  both  cores.  These  contrasts  constitute  an  original, 
unconformity-related  relationship  and  the  process  of  recovering 
and  analyzing  the  core  material  has  not  compromised  this  sig¬ 
nal.  The  distinct  and  immature  composition  of  the  Dwyka  organic 
matter  also  precludes  it  from  being  the  source  of  the  bitumens  in 
the  Transvaal  strata.  The  detection  of  this  demonstrably  syngenetic 
characteristic  of  the  organic  matter  in  both  GKF  and  GKP  cores  pro¬ 
vides  additional  strong  evidence  that  the  biomarkers  in  the  late 
Archean  bitumens  are  not  contaminants. 
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6.  Discussion 

6.1.  LateArchean  molecular  fossils  and  the  antiquity  of  microbial 
diversity 

Taken  together,  we  find  the  evidence  for  the  syngenetic  nature 
of  the  Transvaal  bitumens  (Section  5)  to  be  compelling.  As  they  are 
syngenetic,  and  thus  late  Archean  in  age,  the  bitumens  constitute 
a  molecular  fossil  record  of  microbial  activity  during  the  deposi¬ 
tion  of  the  Transvaal  Supergroup.  The  Transvaal  sediments,  like 
many  Precambrian  sequences,  bear  a  number  of  signs  of  microbial 
involvement  in  their  deposition.  Numerous  sedimentary  textures 
can  be  attributed  to  microbial  activity  (Schroder  et  al.,  this  volume) 
and  large  amounts  of  organic  matter  with  textures  and  isotopic 
compositions  consistent  with  a  biogenic  origin  are  present  (Fis¬ 
cher  et  al.,  this  volume),  so  there  is  no  doubt  that  vigorous  marine 
ecosystems  were  functioning  widely  by  the  late  Archean,  if  not 
much  earlier.  A  molecular  fossil  record  aids  our  understanding  of 
this  period  by  contributing  information  about  the  specific  identi¬ 
ties  and  physiologies  of  the  microorganisms  active  in  late  Archean 
oceans. 

The  late  Archean  molecular  fossil  record  from  this  study  and  oth¬ 
ers  of  early  Precambrian  rocks  (Brocks  et  al..  2003b:  Eigenbrode. 
2004;  Dutkiewicz  et  al.,  2006)  is  the  most  specific  evidence  at 
present  for  the  antiquity  of  the  three  principal  Domains  of  cellu¬ 
lar  life:  the  Bacteria,  Eukarya  and  Archaea.  The  hopanoid  fossils 
detected  throughout  the  Griqualand  cores  are  bacterial  in  origin; 
in  extant  organisms  extended  C35  hopanoids  are  exclusive  to  bac¬ 
teria  (Rohmer  et  al.,  1984),  and  it  is  virtually  certain  that  the  bulk 
of  other  hopanoids  in  late  Archean  bitumens  are  bacterial  (rather 
than  eukaryotic)  as  well.  The  fossil  steroids  detected  in  late  Archean 
rocks  have  been  more  controversial,  and  their  interpretation  as 
molecular  fossils  is  discussed  in  detail  below  (Section  6.2).  We 
consider  the  suite  of  steranes  present  in  the  Transvaal  bitumens 
from  these  cores  to  be  strong  evidence  for  the  presence  of  eukary¬ 
otes  in  late  Archean  coastal  oceans.  In  contrast  to  another  report 
(Ventura  et  al.,  2007),  molecular  fossils  specifically  diagnostic  for 
the  Archaea,  such  as  mid-chain  cyclized  or  head-to-head-linked  > 
C20  isoprenoids,  were  not  detected.  These  extended  isoprenoids, 
even  if  they  were  initially  present,  are  unlikely  to  have  survived  the 
long  thermal  and  diagenetic  history  that  the  Transvaal  Supergroup 
has  experienced.  The  finding  of  characteristic  fossils  of  the  other 
two  Domains,  however,  makes  the  presence  of  at  least  stem-group 
archaea  by  the  late  Archean  a  likely  scenario.  Indirect  geochemi¬ 
cal  evidence  for  widespread  methanogenesis  even  earlier  in  Earth 
history  (Hayes,  1983)  is  further  support  for  Archean  archaea.  The 
consensus  emerging  from  multiple,  independent  molecular  fos¬ 
sil  studies  of  the  late  Archean  rock  record  supports  the  presence 
of  members  of  all  three  Domains  by  at  least  2.7  billion  years 
ago. 

Below  Domain  level,  few  molecular  fossils  have  been  detected 
that  carry  taxonomic  specificity.  Possible  exceptions  include  the 
A-ring  methylated  hopanes  where  2a-methylhopanes  are  provi¬ 
sionally  indicative  of  cyanobacterial  primary  productivity  while 
the  3(3-methylhopanes  are  likely  proteobacterial  and  possibly  from 
methanotrophic  7-proteobacteria.  Gammacerane  (fossil  tetrahy- 
manol),  while  known  to  originate  from  some  ciliates,  is  also 
probably  proteobacterial. 

6.2.  Interpretation  of  late  Archean  steranes 

Perhaps  the  most  contentious  claim  to  arise  from  molecular 
fossil  analysis  of  late  Archean  organic  matter  has  been  the  inter¬ 
pretation  of  steranes  as  indicative  of  the  presence  of  eukaryotes 
in  marine  environments  as  far  back  as  2.7  Ga  (Brocks  et  al.,  1999, 
2003b).  These  molecular  fossils  are  almost  a  billion  years  older  than 


the  earliest  convincingly  eukaryotic  body  fossils  (Knoll  et  al.,  2007) 
although  a  recent  report  offers  evidence  for  an  earlier  appearance 
(Zang,  2007).  The  presence  of  steranes  in  Archean  sediments  sug¬ 
gests  (though  does  not  require)  that  cardinal  events  in  eukaryotic 
evolution  took  place  very  early  in  Earth  history.  Furthermore,  since 
steroid  biosynthesis  requires  molecular  oxygen,  the  presence  of  fos¬ 
sil  steranes  in  late  Archean  rocks  implies  that  O2  was  available  in  at 
least  some  aquatic  ecosystems  several  hundred  million  years  before 
the  first  geological  and  geochemical  evidence  for  persistent  atmo¬ 
spheric  O2.  ca.  2.4  Ga  (Holland,  2006;  Summons  et  al.,  2006).  Since 
the  only  environmentally  significant  source  of  O2  is  oxygenic  pho¬ 
tosynthesis,  fossil  steranes  also  imply  that  oxygenic  photosynthesis 
arose  long  before  the  Paleoproterozoic  appearance  of  atmospheric 
O2. 

This  set  of  interpretations  of  Archean  steranes  has  been  ques¬ 
tioned  on  several  grounds.  First,  the  syngeneity  of  the  sterane  fossils 
has  been  doubted.  The  evidence  presented  herein  (Section  5)  for  the 
syngeneity  of  Archean  molecular  fossils,  including  steranes,  is,  in 
our  opinion,  the  strongest  yet.  We  have  demonstrated  stratigraphic 
and  compositional  relationships  in  cleanly  recovered  drill-core 
samples  that  are  inconsistent  with  anthropogenic  or  geologically 
recent  contamination.  Second,  some  (Raymond  and  Blankenship, 
2004;  Kopp  et  al.,  2005)  have  proposed  that  steroid  biosynthesis 
might  have  been  conducted  by  an  ancestral,  anaerobic  pathway 
that  was  subsequently  “replaced”  by  the  modern,  02-dependent 
route.  These  proposals  have  been  addressed  in  detail  by  Summons 
et  al.  (2006);  we  will  not  repeat  that  discussion  here,  except  to  say 
that  we  do  not  consider  the  arguments  for  a  hypothetical  anaero¬ 
bic  steroid  biosynthetic  pathway  to  be  convincing.  Evidence  for  the 
antiquity  of  the  known  aerobic  pathway  is  robust,  and  its  operation 
during  the  late  Archean  remains  the  most  probable  and  parsimo¬ 
nious  explanation  for  the  presence  of  steranes  in  rocks  of  that  age. 

A  third  objection  has  been  that  a  very  few  bacteria  do  produce 
steroids  de  novo  (Bird  et  al.,  1971 ;  Bode  et  al.,  2003 ;  Pearson  et  al., 
2003)  and  so  the  steranes  in  Archean  rocks  could  have  a  prokaryotic 
source  (Cavalier-Smith,  2006).  Aspects  of  the  phylogenetic  distri¬ 
bution  of  steroid  biosynthesis  among  the  bacteria  and  the  genes 
involved  are  discussed  by  Summons  et  al.  (2006).  It  has  been  noted 
(Brocks  and  Summons,  2003;  Volkman,  2005)  that  no  prokaryote  is 
known  to  be  able  to  produce  the  C28  and  C29  24-alkyl  steroid  struc¬ 
tures  detected  in  abundance  in  the  late  Archean  rocks.  Additionally, 
the  majority  of  prokaryotic  steroids  are  4-methyl  and  4,4-dimethyl, 
resulting  from  the  typically  abbreviated  steroid  biosynthetic  path¬ 
ways  of  these  few  bacteria.  A  dominance  of  4-methyl  steroids  was 
found  in  the  Mesoproterozic  (1.64Ga)  Barney  Creek  Formation, 
likely  deposited  under  strongly  euxinic  conditions  and  reflecting 
minimal  eukaryotic  activity  in  highly  sulfidic  waters  (Brocks  et  al, 
2005).  As  noted  above,  4-methyl  steranes  were  not  observed  in  the 
Griqualand  core  material,  despite  efforts  taken  specifically  to  detect 
them,  suggesting  that  4-desmethyl  (i.e.,  more  typically  eukaryotic) 
steroids  were  a  much  larger  component  of  the  sedimentary  organic 
matter. 

To  these  biosynthetic  considerations,  analysis  of  the  two 
Agouron  cores  adds  an  ecological  dimension:  steranes  with  all  the 
hallmarks  of  syngenetic  molecular  fossils  are  found  consistently 
over  more  than  2500  m  of  core  that  stratigraphically  represent 
depositional  environments  from  shallow  marine  to  deep  slope.  The 
source  organisms  for  the  fossil  steranes  must  therefore  have  had 
a  broad  ecological  distribution  in  the  coastal  ocean  and  have  con¬ 
tributed  substantially  to  organic  matter  input  to  sediments  over 
such  a  range  of  conditions.  The  source  organisms  must  also  have 
maintained  that  ecological  distribution  and  organic  matter  pro¬ 
duction  throughout  the  ~200  million  years  of  geological  history 
represented  in  the  cores.  None  of  the  known  bacterial  steroid  pro¬ 
ducers  has  such  an  ecological  distribution  or  prominence  in  the 
fossil  record.  Arguments  for  a  prokaryotic  origin  for  fossil  steranes 
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must  account  for  both  biosynthetic  and  ecological  discrepancies 
between  the  established  biology  of  steroid-producing  bacteria  and 
the  fossil  record,  or  else  postulate  extinct  biochemistries  and  ecolo¬ 
gies  for  putative  marine  steroid  producing  bacteria  for  which  all 
other  evidence  has  been  lost. 

The  range  of  sterane  skeletons  observed  suggests  that  a  num¬ 
ber  of  different  steroid  synthesis  pathways  were  operative  in  this 
coastal  marine  environment,  at  the  very  least  with  regard  to  side- 
chain  alkylation.  The  carbon-number  distributions  (C27/C28/C29; 
Fig.  7)  of  steranes  in  the  late  Archean  bitumens  are  suggestive  of 
the  simultaneous  presence  of  multiple  protistan  sources.  In  the 
absence  of  a  more  continuous  Precambrian  body  and  molecular 
fossil  record,  it  is  premature  to  interpret  these  carbon-number 
patterns  in  terms  of  the  predominance  of  particular  protistan 
groups,  as  has  been  done  successfully  over  much  of  the  Phanero- 
zoic  (Grantham  and  Wakefield,  1988;  McCaffrey  et  al.,  1994;  Holba 
et  al.,  1998;  Knoll  et  al.,  2007).  We  cannot  say  definitively  whether 
these  eukaryotes  were  photo-  or  heterotrophic  -  which  would  put 
a  constraint  on  the  timing  of  primary  plastid  endosymbiosis  -  as 
steroid  synthesis  appears  to  have  been  present  in  the  last  common 
ancestor  of  all  extant  eukaryotes  (Summons  et  al.,  2006).  What¬ 
ever  the  metabolic  modes  of  their  producers,  the  finding  of  steranes 
throughout  the  Transvaal  Supergroup  strata  attests  to  a  broad  eco¬ 
logical  distribution  and  persistence  of  the  source  organisms.  No 
sterane  fossils  are  found  that  cannot  be  attributed  to  recognized 
extant  biosynthetic  pathways,  nor  are  there  ‘orphan’  molecules 
indicative  of  extinct  steroid  biochemistry.  Remarkably,  much  of 
steroid  biosynthesis  -  at  least  with  regard  to  ring-system  forma¬ 
tion  and  side-chain  modification  -  appears  to  have  been  operative 
by  at  least  2.6  Ga. 

It  seems  unlikely  that  the  relative  amounts  of  biomarker  classes 
-  such  as  sterane/hopane  ratios  -  could  be  used  to  infer  relative 
contributions  to  sedimentary  organic  matter  or  community  abun¬ 
dance  of  groups  of  organisms,  such  as  eukaryotes  versus  bacteria, 
in  these  ancient  and  overmature  bitumens.  On  organic  geochemi¬ 
cal  grounds,  not  enough  is  known  about  the  relative  stabilities  of 
different  biomarker  structures  once  such  high  degrees  of  thermal 
maturity  and  extreme  ages  are  reached,  and  small  differences  will 
become  magnified  over  very  long  alteration  histories.  On  biosyn¬ 
thetic  grounds,  while  the  vast  majority  of  free-living  eukaryotes 
produce  steroids,  only  an  unquantified  minority  of  bacteria  make 
hopanoids,  and  this  proportion  is  almost  certainly  not  constant 
over  different  environmental  conditions  or  Earth  history.  The  ster¬ 
anes  detected  in  the  Transvaal  Supergroup  bitumens  indicate  the 
presence  of  eukaryotes,  but  do  not  necessarily  assign  to  them  any 
specific  ecological  prominence. 

In  summation,  the  steranes  detected  throughout  the  Griqua- 
land  cores  have  all  the  hallmarks  of  syngenetic  molecular  fossils 
of  eukaryotes.  Their  isomeric  distributions  are  fully  consistent  with 
the  high  thermal  maturity  expected  of  this  ancient  organic  mat¬ 
ter.  Their  stratigraphic  distribution,  particularly  their  correlation 
between  two  separate  boreholes  and  sharp  maturity  contrast  with 
overlying  rocks,  is  a  geochemical  signature  difficult  to  reconcile 
with  scenarios  of  contamination  by  younger  migrated  hydrocar¬ 
bons  or  during  drilling  and  analysis.  The  sterane  structures  detected 
in  the  Archean  bitumens  are  typical  of  those  produced  by  eukary¬ 
otes,  unknown  (or  at  best  highly  atypical)  among  prokaryotes,  and 
occur  over  a  range  of  paleoenvironments  inconsistent  with  present 
knowledge  of  the  ecology  of  steroid-producing  bacteria.  Current 
understanding  of  the  chemistry  of  steroid  biosynthesis  suggests 
that  molecular  oxygen  is  absolutely  required,  especially  for  produc¬ 
tion  of  the  4,14-desmethyl  steranes  that  dominate  the  late  Archean 
(and  most  younger)  bitumens  (Summons  et  al,  2006).  Hence,  this 
molecular  fossil  record  provides  simultaneous  evidence  for  the 
three  Domains  of  life,  oxygenic  photosynthesis  and  the  anabolic 
use  of  O2. 


7.  Conclusions 

The  Agouron  Griqualand  drilling  project  has  recovered  relatively 
well-preserved  organic-rich  strata  from  the  Transvaal  Supergroup, 
and  has  done  so  without  overprinting  syngenetic  molecular  fos¬ 
sil  signatures  with  contaminant  hydrocarbons.  These  drill  cores 
have  enabled  stratigraphic  comparisons  of  biomarker  contents  that 
support  their  syngenetic  nature.  The  suite  of  molecular  fossils  iden¬ 
tified  in  the  late  Archean  bitumens  includes  hopanes  attributable  to 
bacteria,  potentially  including  cyanobacteria  and  methanotrophs, 
and  steranes  almost  certainly  of  eukaryotic  origin.  This  work  is 
in  accord  with  other  reports  (Brocks.  2001;  Brocks  et  al.,  2003b: 
Eigenbrode,  2004;  Dutkiewicz  et  al..  2006)  of  early  Precambrian 
molecular  fossils  and  extends  those  findings  to  the  Kaapvaal  era- 
ton. 

What,  we  might  ask,  would  constitute  yet  stronger  evidence  for 
microbial  (including  eukaryotic)  diversity  in  the  Archean?  There  is 
always  the  hope  that  some  sort  of  Lagerstiitte  might  yet  be  found, 
a  trove  of  spectacularly  well  preserved  fossils  -  morphological 
or  molecular  -  of  Archean  microbes.  Hope  of  such  finds  rests  on 
continued  exploration  of  the  Archean  rock  record.  The  Agouron  Gri¬ 
qualand  drilling  project  and  other  Earth-history  drilling  efforts  have 
highlighted  the  value  of  extending  such  exploration  to  the  subsur¬ 
face.  Beyond  the  perhaps  faint  hope  of  finding  a  pocket  of  lightly 
altered,  thermally  immature  Archean  sediment,  drilling  projects 
recover  long  stretches  of  continuous  stratigraphy,  unavailable  in 
outcrop,  that  allow  multiproxy  geochemical  records  such  as  the 
ones  presented  in  this  volume  to  be  constructed  from  material 
whose  provenance  is  controlled. 

The  Archean  paleobiological  record,  incomplete  and  problem¬ 
atic  as  it  is,  is  beginning  to  yield  a  picture  of  the  early  phases  of 
evolutionary  history.  An  emerging  finding  is  that  much  of  cellu¬ 
lar  biochemistry,  including  diverse  metabolic  pathways  of  electron 
transport,  photosynthesis,  carbon  fixation,  nutrient  assimilation 
and  lipid  biosynthesis,  appears  to  have  arisen  very  early  in  Earth 
history.  The  Archean  might  be  viewed  as  the  cardinal  epoch  of 
biochemical  innovation,  resulting  in  most  of  the  molecular  toolkit 
on  which  a  wide  spectrum  of  biological  form  and  function  would 
eventually  be  based.  This  possibility  makes  understanding  of  the 
Archean  surface  environment  -  the  physicochemical  context  in 
which  this  innovation  took  place  -  all  the  more  essential. 

Acknowledgements 

We  thank  the  Agouron  Institute  and  members  of  the  Griqualand 
Drilling  Project  team  for  provision  of  the  cores.  Alex  Birch  oversaw 
the  clean  drilling  operation  and  Francis  McDonald  provided  field 
support.  We  are  especially  grateful  to  Nic  Beukes  and  joe  Kirschvink 
for  discussions  of  geologic  context  and  for  sampling  logistics,  and 
to  Gordon  Love  for  discussions  of  biomarker  analyses.  Funding 
support  for  this  work  came  from  the  NASA  Exobiology  Program 
(Grants  NNG05GN62G  and  NNG04Gjl3G),  NSF  (EAR0418619)  and 
the  Agouron  Institute.  JRW  receives  support  through  an  NDSEG  Fel¬ 
lowship  from  the  Office  of  Naval  Research  and  a  Graduate  Research 
Fellowship  from  the  National  Science  Foundation.  John  Zumberge 
of  GeoMark  Research  provided  crude  oil  samples  used  as  reference 
materials.  Carolyn  Colonero  provided  laboratory  assistance,  partic¬ 
ularly  in  the  maintenance  and  operation  of  the  mass  spectrometers. 
Thanks  to  Andrew  Knoll  for  helpful  comments,  and  to  reviewers 
Simon  George  and  Roger  Buick  for  suggestions  that  improved  the 
manuscript. 

References 

Allwood,  A.C.,  Walter,  M.R.,  Kamber,  B.S.,  Marshall,  C.P.,  Burch.  l.W.,  2006.  Stromato¬ 
lite  reef  from  the  Early  Archaean  era  of  Australia.  Nature  441,  714-718. 


40 


46 


J.R.  Waldbauer  et  al.  /  Precambrian  Research  169(2009)28-47 


Altermann,  W.,  Kazmierczak,  J.,  Oren,  A.,  Wright.  D.T.,  2006.  Cyanobacterial  calcifi¬ 
cation  and  its  rock-building  potential  during  3.5  billion  years  of  Earth  history. 
Geobiology  4, 147-166. 

Armstrong,  R.A.,  Compston,  W.,  Retief,  E.A..  Williams,  I.S.,  Welke,  H.J.,  1991.  Zircon 
ion  microprobe  studies  bearing  on  the  age  and  evolution  of  the  Witwatersrand 
triad.  Precambrian  Research  53,  243-266. 

Barton,  E.S.,  Altermann,  W.,  Williams.  I.S.,  Smith.  C.B.,  1994.  U-Pb  zircon  age  for  a  tuff 
in  the  Campbell  Group.  Griqualand  West  Sequence,  South  Africa:  implications 
for  Early  Proterozoic  rock  accumulation  rates.  Geology  22, 343-346. 

Beukes,  N.J.,  1987.  Facies  relations,  depositional  environments  and  diagenesis  in 
a  major  Early  Proterozoic  stromatolitic  carbonate  platform  to  basin  sequence. 
Campbellrand  Subgroup,  Transvaal  Supergroup,  Southern  Africa.  Sedimentary 
Geology  54, 1-46. 

Beukes,  N.J..  Smit,  C.A.,  1987.  New  evidence  for  thrusting  in  Griqualand  West.  South 
Africa;  Implications  for  stratigraphy  and  the  age  of  red  beds.  South  African  jour¬ 
nal  of  Geology  90,  378-394. 

Beukes,  N.J.,  Evans.  D.A.D..  Grotzinger,  J.P.,  Kirschvink,  J.L.,  Knoll.  A.H.,  Sumner, 
D.Y.,  2004.  Multidisciplinary  study  of  the  precambrian  biosphere  and  surficial 
oxygenation,  Kaapvaal  Craton,  South  Africa:  the  Agouron  Cores.  International 
Journal  of  Astrobiology  3  (Suppl.  1 ),  15. 

Bird,  C.W.,  Lynch,  J.M.,  Pirt,  F.J.,  et  al.,  1971.  Steroids  and  squalene  in  Methylococcus 
capsulatus  grown  on  methane.  Nature  230, 473-474. 

Blanc,  P..  Connan,  J.,  1992.  Origin  and  occurrence  of  25-norhopanes:  a  statistical 
study.  Organic  Geochemistry  18.  813-828. 

Bode,  H.B..  Zeggel,  B..  Silakowski,  B..  Wenzel,  S.C.,  Reichenbach,  H..  Muller,  R.,  2003. 
Steroid  biosynthesis  in  prokaryotes;  identification  of  myxobacterial  steroids  and 
cloning  of  the  first  bacterial  2,3(S)-oxidosqualene  cyclase  from  the  myxobac- 
terium  Stigmatella  aurantiaca.  Molecular  Microbiology  47, 471-481. 

Bosak,  T.,  Greene,  S.E.,  Newman,  D.K..  2007.  A  likely  role  for  anoxygenic  photosyn¬ 
thetic  microbes  in  the  formation  of  ancient  stromatolites.  Geobiology  5, 119-126. 

Bravo.  J.M.,  Perzl,  M.,  Hartner,  T.,  Kannenberg,  E.L.,  Rohmer,  M.,  2001.  Novel 
methylated  triterpenoids  of  the  gammacerane  series  from  the  nitrogen-fixing 
bacterium  Bradyrhizobium  japonicum  USDA  110.  European  Journal  of  Biochem¬ 
istry  268,  1323-1331. 

Brincat,  D.,  Abbott,  G.D.,  2001.  Some  aspects  of  the  molecular  biogeochemistry  of 
laminated  and  massive  rocks  from  the  Naples  Beach  Section  (Santa  Barbara- 
Ventura  Basin).  In:  Isaacs.  C.M.,  Rullkotter,  J.  (Eds.),  The  Monterey  Formation; 
From  Rocks  to  Molecules.  Columbia  University  Press.  New  York,  pp.  140-149. 

Brocks,  J.J.,  2001.  Molecular  fossils  in  Archean  rocks.  Ph.D.  Thesis.  Sydney  University. 
Sydney.  Australia. 

Brocks,  J.J.,  Summons,  R.E.,  2003.  Sedimentary  hydrocarbons,  biomarkers  for  early 
life.  In:  Holland,  H.,  Turekian,  K.K.  (Eds.),  Treatise  on  Geochemistry.  Pergamon, 
Oxford,  pp.  63-115. 

Brocks,  J.J.,  Logan,  G.A.,  Buick,  R.,  Summons,  R.E.,  1999.  Archean  molecular  fossils  and 
the  early  rise  of  Eukaryotes.  Science  285, 1033-1036. 

Brocks,  J.J.,  Buick,  R.,  Logan,  G.A.,  Summons,  R.E.,  2003a.  Composition  and  syngeneity 
of  molecular  fossils  from  the  2.78  to  2.45  billion-year-old  Mount  Bruce  Super¬ 
group,  Pilbara  Craton,  Western  Australia.  Geochimica  et  Cosmochimica  Acta  67. 
4289-4319. 

Brocks,  J.J.,  Buick.  R.,  Summons,  R.E..  Logan,  G.A.,  2003b.  A  reconstruction  of  Archean 
biological  diversity  based  on  molecular  fossils  from  the  2.78  to  2.45  billion-year- 
old  Mount  Bruce  Supergroup,  Hamersley  Basin,  Western  Australia.  Geochimica 
et  Cosmochimica  Acta  67, 4321-4335. 

Brocks.  J.J.,  Love.  G.D.,  Summons,  R.E.,  Knoll,  A.H.,  Logan,  G.A.,  Bowden.  S.A.,  2005. 
Biomarker  evidence  for  green  and  purple  sulphur  bacteria  in  a  stratified  Palaeo- 
proterozoic  sea.  Nature  437, 866-870. 

Brocks,  J.J.,  Grosjean,  E.,  Logan,  G.A.,  2008.  Assessing  biomarker  syngeneity  using 
branched  alkanes  with  quaternary  carbon  (BAQCs)  and  other  plastic  contami¬ 
nants.  Geochimica  et  Cosmochimica  Acta  72,  871-888. 

Buick,  R.,  1992.  The  antiquity  of  oxygenic  photosynthesis:  evidence  from  stromato¬ 
lites  in  sulphate-deficient  Archaean  lakes.  Science  255,  74-77. 

Button,  A.,  1973.  The  stratigraphic  history  of  the  Malmani  dolomite  in  the  eastern  and 
north-eastern  Transvaal.  Transactions  of  the  Geological  Society  of  South  Africa 
76.  229-247. 

Cavalier-Smith,  T..  2006.  Rooting  the  tree  of  life  by  transition  analyses.  Biology  Direct 
1.  doi;10.1186/1745-6150-l-19. 

Chosson,  P.,  Connan,  J.,  Dessert,  D.,  Lanau,  C.,  1992.  In  vitro  biodegradation  of  steranes 
and  terpanes:  A  clue  to  understanding  geological  situations.  In:  Moldowan,  J.M., 
Albrecht,  P.,  Philp,  R.P.  (Eds.),  Biological  Markers  in  Sediments  and  Petroleum. 
Prentice  Hall,  Englewood  Cliffs,  NJ,  pp.  320-349. 

Clay,  A.N.,  1986.  The  stratigraphy  of  the  Malmani  Dolomite  Subgroup  in  the  Car- 
letonville  area,  Transvaal:  genetic  implications  for  lead-zinc  mineralization.  In; 
Anhaeusser,  C.R.,  Maske,  S.  (Eds.),  Mineral  Deposits  of  Southern  Africa.  Geological 
Society  of  South  Africa,  Johannesburg,  pp.  853-860. 

Cloud,  P.,  1973.  Paleoecological  significance  of  the  banded  iron-formation.  Economic 
Geology  68, 1135-1143. 

Cloud,  P,  Licari,  G.R.,  1968.  Microbiotas  of  the  banded  iron  formations.  PNAS  61, 
779-786. 

Duane,  M.J..  Kruger.  F.J.,  1991.  Geochronological  evidence  for  tectonically  driven  brine 
migration  during  the  early  Proterozoic  Kheis  Orogeny  of  southern  Africa.  Geo¬ 
physical  Research  Letters  18,  975-978. 

Duane.  M.J.,  Kruger,  F.J.,  Roberts.  P.J.,  Smith.  G.B.,  1991.  Pb  and  Sr  isotope  and  origin  of 
Proterozoic  base  metal  (fluorite)  and  gold  deposits,  Transvaal  Sequence,  South 
Africa.  Economic  Geology  86, 1491-1505. 

Duane,  M.J.,  Kruger,  F.J.,  Turner,  A.M.,  Whitelaw,  H.T.,  Coetzee,  H.,  Verhagen,  B.T.,  2004. 
The  timing  and  isotopic  character  of  regional  hydrothermal  alteration  and  asso¬ 


ciated  epigenetic  mineralization  in  the  western  sector  of  the  Kaapvaal  Craton 
(South  Africa).  Journal  of  African  Earth  Sciences  38, 461  -476. 

Dutkiewicz,  A.,  Ridley.J.,  Buick.  R..  2003a.  Oil-bearing  CO2-CH4-H2O  fluid  inclusions: 
oil  survival  since  the  Palaeoproterozoic  after  high  temperature  entrapment. 
Chemical  Geology  194,  51-79. 

Dutkiewicz,  A.,  Volk,  H.,  Ridley,  J.,  George,  S.C.,  2003b.  Biomarkers,  brines,  and  oil  in 
the  Mesoproterozoic,  Roper  Superbasin.  Australia.  Geology  31  (11),  981-984. 

Dutkiewicz,  A.,  Volk,  H.,  Ridley,  J.,  George,  S.C.,  2004.  Geochemistry  of  oil  in  fluid 
inclusions  in  a  middle  Proterozoic  igneous  intrusion:  implications  for  the  source 
of  hydrocarbons  in  crystalline  rocks.  Organic  Geochemistry  35  (8),  937-957. 

Dutkiewicz,  A.,  Volk,  H..  George,  S.C.,  Ridley,  J.,  Buick,  R..  2006.  Biomarkers  from 
Huronian  oil-bearing  fluid  inclusions:  an  uncontaminated  record  of  life  before 
the  Great  Oxidation  Event.  Geology  34. 437-440. 

Dutkiewicz,  A..  George,  S.C..  Mossman,  D.J..  Ridley,  J.,  Volk.  H..  2007.  Oil  and  its 
biomarkers  associated  with  the  Palaeoproterozoic  Oklo  natural  fission  reactors. 
Gabon.  Chemical  Geology  244, 130-154. 

Dzou,  L.,  Holba,  A.G.,  Ramon,  J.,  Moldowan,  J.M..  Zinniker,  D.,  1999.  Application  of  new 
diterpane  biomarkers  to  source,  biodegradation  and  mixing  effects  on  Central 
Llanos  Basin  oils.  Colombia.  Organic  Geochemistry  30.  515-534. 

Eglinton,  G.,  Scott,  P.M.,  Besky,  T.,  Burlingame,  A.L..  Calvin.  M.,  1964.  Hydrocarbons 
from  a  one-billion-year-old  sediment.  Science  145, 263-264. 

Eigenbrode,  J.L,  2004.  Late  Archean  Microbial  Ecology;  an  integration  of  molecular, 
isotopic  and  lithological  studies.  Ph.D  Thesis.  The  Pennsylvania  State  University. 
State  College,  PA. 

Eigenbrode,  J.L,  2008.  Fossil  lipids  for  life-detection;  a  case  study  from  the  early 
earth  record.  Space  Science  Reviews.  doi:10.1007/sll214-007-9252-9. 

Eigenbrode,  J.L,  Freeman,  K.H.,  2006.  Late  Archean  rise  of  aerobic  microbial  ecosys¬ 
tems.  PNAS  103  (43),  15759-15764. 

Eigenbrode,  J.L,  Freeman,  K.H.,  Summons,  R.E..  2008.  Methylhopane  biomarker 
hydrocarbons  in  Hamersley  Province  sediments  provide  evidence  for 
Neoarchean  aerobiosis.  Earth  and  Planetary  Science  Letters  273, 323-331. 

Fischer.  W.W.,  Pearson,  A.,  2007.  Hypotheses  for  the  origin  and  early  evolution  of 
triterpenoid  cyclases.  Geobiology  5, 19-34. 

George,  S.C.,  Volk.  H.,  Dutkiewicz.  A..  Ridley.J.,  Buick.  R.,  2008.  Preservation  of  hydro¬ 
carbons  and  biomarkers  in  oil  trapped  inside  fluid  inclusions  for  >2  billion  years. 
Geochimica  et  Cosmochimica  Acta  72,  844-870. 

Grantham,  P.J.,  Wakefield.  L.L.,  1988.  Variations  in  the  sterane  carbon  number  dis¬ 
tributions  of  marine  source  rock  derived  crude  oils  through  geological  time. 
Organic  Geochemistry  12,  61-73. 

Grantham,  P.J.,  Posthuma,  J..  DeGroot,  K.,  1980.  Variation  and  significance  of  the  C27 
and  C28  triterpane  content  of  a  North  Sea  core  and  various  North  Sea  crude  oils. 
In:  Douglas.  A.G.,  Maxwell,  J.R.  (Eds.),  Advances  in  Organic  Geochemistry  1979. 
Pergamon  Press,  New  York,  pp.  29-38. 

Grosjean,  E.,  Logan,  G.A.,  2007.  Incorporation  of  organic  contaminants  into  geochem¬ 
ical  samples  and  an  assessment  of  potential  sources:  examples  from  Geoscience 
Australia  marine  survey  S282.  Organic  Geochemistry  38.  853-869. 

Grotzinger,  J.P.,  Rothman,  D.H.,  1996.  An  abiotic  model  for  stromatolite  morphogen¬ 
esis.  Nature  383,  423-425. 

Harvey,  H.R.,  McManus,  G.B..  1991.  Marine  ciliates  as  a  widespread  source  of  tetrahy- 
manol  and  hopan-3(3-ol  in  sediments.  Geochimica  et  Cosmochimica  Acta  55. 
3387-3390. 

Hayes,  J.M.,  1983.  Geochemical  evidence  bearing  on  the  origin  of  aerobiosis,  a  specu¬ 
lative  hypothesis.  In;  Schopf,  J.W.  (Ed.).  Earth’s  Earliest  Biosphere:  Its  Origin  and 
Evolution.  Princeton  University  Press,  Princeton,  NJ,  pp.  291-301. 

Hofmann,  H.,  Grey,  K.,  Hickman,  A.,  Thorpe,  R.,  1999.  Origin  of  3.45  Ga  coniform 
stromatolites  in  Warrawoona  Group,  Western  Australia.  Geological  Society  of 
America  Bulletin  111,  1256-1262. 

Holba,  A.G.,  Dzou,  LLP.,  Masterson,  W.D..  Hughes,  W.B..  Huizinga,  B.J.,  Singletary,  M.S., 
Moldowan, J.M.,  Mello,  M.R.,  Tegelaar,  E.,  1998.  Application  of  24-norcholestanes 
for  constraining  source  age  of  petroleum.  Organic  Geochemistry  29, 1269-1283. 

Holland,  H.D.,  2006.  The  oxygenation  of  the  atmosphere  and  oceans.  Philosophical 
Transactions  of  the  Royal  Society  B  361, 903-915. 

House,  C.H.,  Schopf,  J.W.,  McKeegan,  K.D.,  Coath,  C.D.,  Harrison,  T.M.,  Stetter,  K.O., 
2000.  Carbon  isotopic  composition  of  individual  Precambrian  microfossils.  Geol¬ 
ogy  28. 707-710. 

Kleemann,  G.,  Poralla,  K.,  Englert,  G.,  Kjosen,  H.,  Liaaen-Jensen,  S.,  Neunlist,  S., 
Rohmer,  M.,  1990.  Tetrahymanol  from  the  phototrophic  bacteriaum  Rhodopseu- 
domonas  palustris:  first  report  of  a  gammacerane  triterpene  from  a  prokaryote. 
Journal  of  General  Microbiology  136,  2551-2553. 

Knoll,  A.H..  2003.  Life  on  a  young  planet.  Princeton  University  Press.  Princeton,  NJ. 

Knoll,  A.H.,  Summons,  R.E.,  Waldbauer,  J.R.,  Zumberge,  J.,  2007.  The  geological  suc¬ 
cession  of  primary  producers  in  the  oceans.  In;  Falkowski,  P,  Knoll,  A.H.  (Eds.), 
The  Evolution  of  Primary  Producers  in  the  Sea.  Academic  Press,  Boston,  pp.  133- 
163. 

Kopp,  R.E.,  Kirschvink.  J.L,  Hilburn,  LA.,  Nash,  C.Z.,  2005.  The  Paleoproterozoic 
snowball  Earth:  a  climate  disaster  triggered  by  the  evolution  of  oxygenic  photo¬ 
synthesis.  PNAS  102, 11131-11136. 

Martini,  J.E.J.,  1976.  The  fluorite  deposits  in  the  Dolomite  Series  of  the  Marico  District, 
Transvaal,  South  Africa.  Economic  Geology  71, 625-635. 

McCaffrey.  M.A.,  Moldowan,  J.M.,  Lipton,  P.A.,  Summons.  R.E.,  Peters,  K.E.,Jeganathan. 
A.,  Watt,  D.S.,  1994.  Paleoenvironmental  implications  of  novel  C30  steranes  in  Pre¬ 
cambrian  to  Cenozoic  Age  petroleum  and  bitumen.  Geochimica  et  Cosmochimica 
Acta  58,  529-532. 

Miyano,  T.,  Beukes,  N.J.,  1984.  Phase  relations  of  stilpnomelane,  ferri-annite,  and 
riebeckite  in  very  low-grade  metamorphosed  iron-formations.  Transactions  of 
the  Geological  Society  of  South  Africa  87, 111-124. 


41 


J.R.  Waldbauer  et  al.  /  Precambrian  Research  169(2009)28-47 


47 


Moldowan,  J.M.,  Dahl,  J.,  Huizinga,  B.J.,  Fago,  F.J..  Hickey,  LJ..  Peakman,  T.M.,  Tay¬ 
lor,  D.W.,  1994.  The  molecular  fossil  record  of  oleanane  and  its  relation  to 
angiosperms.  Science  265,  768-771. 

Nytoft,  H.P.,  Bojesen-Koefoed,  J.A.,  Christiansen,  F.G.,  Fowler,  M.G.,  2002.  Oleanane 
or  lupane?  Reappraisal  of  the  presence  of  oleanane  in  Cretaceous-Tertiary  oils 
and  sediments.  Organic  Geochemistry  33. 1225-1240. 

Ourisson,  G..  Albrecht,  P.,  1992.  Geohopanoids;  the  most  abundant  natural  products 
on  Earth?  Accounts  of  Chemical  Research  25. 298-402. 

Ourisson,  G.,  Rohmer,  M.,  Poralla,  K.,  1987.  Prokaryotic  Hopanoids  and  other  Polyter¬ 
penoid  Sterol  Surrogates.  Annual  Review  of  Microbiology  41, 301  -333. 

Pearson.  A.,  Budin,  M.,  Brocks.  J.J.,  2003.  Phylogenetic  and  biochemical  evi¬ 
dence  for  sterol  synthesis  in  the  bacterium  GemmaCa  obscuriglobus.  PNAS  100. 
15352-15357. 

Peters,  K.E.,  Moldowan,  J.M.,  McCaffrey,  M.A.,  Fago.  F.J.,  1996.  Selective  biodegrada¬ 
tion  of  extended  hopanes  to  25-norhopanes  in  petroleum  reservoirs:  insights 
from  molecular  mechanics.  Organic  Geochemistry  24. 765-783. 

Peters,  K.E..  Walters,  C.C.,  Moldowan,  J.M.,  2005.  The  Biomarker  Guide,  Second  Edi¬ 
tion.  Cambridge  University  Press,  Cambridge,  UK. 

Rashby,  S.E.,  Sessions,  A.L.,  Summons,  R.E.,  Newman,  D.K..  2007.  Biosynthesis 
of  2-methylbacteriohopanepolyols  by  an  anoxygenic  phototroph.  PNAS  104, 
15099-15104. 

Raymond,  J.,  Blankenship,  R.E.,  2004.  Biosynthetic  pathways,  gene  replacement  and 
the  antiquity  of  life.  Geobiology  2, 199-203. 

Rohmer,  M.,  Bouvier-Nave,  P.  Ourisson,  G..  1984.  Distribution  of  Hopanoid  Triter- 
penes  in  Prokaryotes.  Journal  of  General  Microbiology  130, 1137-1150. 

Rubinstein.  1.,  Sieskind,  0.,  Albrecht,  P.,  1975.  Rearranged  steranes  in  a  shale: 
occurrence  and  simulated  formation.  Journal  of  the  Chemical  Society,  Perkin 
Transaction  1. 1833-1836. 

Schopf,  J.W.,  2006.  Fossil  evidence  of  Archaean  life.  Philosophical  Transactions  of  the 
Royal  Society  B  361,  869-885. 

Schopf.  J.W.,  Walter,  M.R.,  1983.  Archean  microfossils— new  evidence  of  ancient 
microbes.  In:  Schopf.  J.W.  (Ed.),  Earth’s  earliest  biosphere:  its  origin  and  evo¬ 
lution.  Princeton  University  Press,  Princeton,  NJ.  pp.  214-239. 

Schroder,  S.,  Lacassi.J.P.,  Beukes,  N.J.,  2006.  Stratigraphic  and  geochemical  framework 
of  the  Agouron  drill  cores,  Transvaal  Supergroup  ( Neoarchean-Paleoproterozoic, 
South  Africa).  South  African  Journal  of  Geology  109, 23-54. 

Shen,  Y..  Buick,  R.,  2004.  The  antiquity  of  microbial  sulfate  reduction.  Earth  Science 
Reviews  64,  243-272. 

Shen,  Y,  Buick,  R.,  Canfield,  D.E..  2001.  Isotopic  evidence  for  microbial  sulphate 
reduction  in  the  early  Archaean  era.  Nature  410, 77-81. 

Sherman.  LS..  Walbauer,  J.R..  Summons.  R.E.,  2007.  Improved  methods  for  isolating 
and  validating  indigenous  biomarkers  in  Precambrian  rocks.  Organic  Geochem¬ 
istry  38,  1987-2000. 

Sieskind.  O.,  Joly,  G..  Albrecht,  R,  1979.  Simulation  of  the  geochemical  transformation 
of  sterols:  superacid  effects  of  clay  minerals.  Geochimica  et  Cosmochimica  Acta 
43,1675-1679. 

Sinninghe  Damste.J.S.,  Kenig,  F.,  Koopmans,  M.P.,  Koster.J.,  Schouten,  S.,  Hayes,  J.M., 
de  Leeuw,  J.W..  1995.  Evidence  for  gammacerane  as  an  indicator  of  water  column 
stratification.  Geochimica  et  Cosmochimica  Acta  59. 1895-1900. 

Smith,  J.W.,  George,  S.C.,  Batts,  B.D.,  1995.  The  geosynthesis  of  alkylaromatics. 
Organic  Geochemistry  23,  71-80. 

Subroto,  E.A.,  Alexander,  R..  Kagi,  R.I..  1991.  30-Norhopanes:  their  occurrence  in 
sediments  and  crude  oils.  Chemical  Geology  93, 179-192. 


Subroto,  E.A.,  Alexander,  R..  Pranjoto,  U.  Kagi.  R.I..  1992.  The  use  of  30-norhopane 
series,  a  novel  carbonate  marker,  in  source  rock  to  crude  oil  correlation  in  the 
North  Sumatra  basin,  in:  Proceedings  of  the  Indonesian  Petroleum  Association 
21st  Annual  Convention,  pp.  145-163. 

Summons.  R.E.,  Jahnke,  L.L.,  Hope,  J.M..  Logan,  G.A.,  1999.  2-Methylhopanoids  as 
biomarkers  for  cyanobacterial  oxygenic  photosynthesis.  Nature  400,  554-556. 

Summons.  R.E.,  Bradley,  A.S.,  Jahnke,  L.L.,  Waldbauer,  J.R.,  2006.  Steroids,  triter¬ 
penoids  and  molecular  oxygen.  Philosophical  Transactions  of  the  Royal  Society 
B  361,  951-968. 

Sumner,  D.Y,  Beukes,  N.J.,  2006.  Sequence  stratigraphic  development  of  the 
Neoarchean  Transvaal  carbonate  platform,  Kaapvaal  Craton,  South  Africa.  South 
African  Journal  of  Geology  109, 11-22. 

Sumner,  D.Y.,  Bowring.  S.A..  1996.  U-Pb  geochronologic  constraints  on  deposition  of 
the  Campbellrand  Subgroup.  Transvaal  Supergroup.  South  Africa.  Precambrian 
Research  79,  25-35. 

Tyler,  R..  Tyler,  N.,  1996.  Stratigraphic  and  structural  controls  on  gold  mineraliza¬ 
tion  in  the  Pilgrim’s  Rest  goldfield,  eastern  Transvaal,  South  Africa.  Precambrian 
Research  79, 141-169. 

van  Kaam-Peters,  H.M.E.,  Koster,  J.,  va  der  Gaast,  S.J.,  Dekker,  M.,  de  Leeuw,  J.W., 
Sinninghe  Damste,  J.S.,  1998.  The  effect  of  clay  minerals  on  diasterane/sterane 
ratios.  Geochimica  et  Cosmochimica  Acta  62, 2923-2929. 

Ventura,  G.T..  Kenig,  F.,  Reddy,  CM.,  Schieber,  J.,  Frysinger,  G.S.,  Nelson,  R.K.,  Dinel,  E., 
Gaines,  R.B.,  Schaeffer,  R,  2007.  Molecular  evidence  of  Late  Archean  archaea  and 
the  presence  of  a  subsurface  hydrothermal  biosphere.  PNAS  104, 14260-14265. 

Visser,  J.N.J.,  1989.  The  Permo-Carboniferous  Dwyka  Formation  of  Southern  Africa: 
deposition  by  a  predominantly  subpolar  marine  ice  sheet.  Palaeogeography, 
Palaeoclimatology,  Palaeoecology  70. 377-391. 

Volk,  H..  Dutkiewicz,  A..  George,  S.C..  Ridley,  J.,  2003.  Oil  migration  in  the  Middle 
Proterozoic  Roper  Superbasin,  Australia:  evidence  from  oil  inclusions  and  their 
geochemistries.  Journal  of  Geochemical  Exploration  78-79, 437-441. 

Volkman,J.K.,2005.  Sterols  and  other  triterpenoids:  source  specificity  and  evolution 
of  biosynthetic  pathways.  Organic  Geochemistry  36, 139-159. 

Volkman,  J.K.,  Alexander,  R.,  Kagi.  R.I.,  Woodhouse,  G.W.,  1983.  Demethylated 
hopanes  in  crude  oils  and  their  applications  in  petroleum  geochemistry. 
Geochimica  et  Cosmochimica  Acta  47. 785-794. 

Walraven,  R,  Martini,  J.,  1995.  Zircon  Pb-evaporation  age  determinations  of  the 
Oak  Tree  Formation,  Chuniespoort  Group,  Transvaal  Sequence:  implications  for 
Transvaal-Griqualand  West  basin  correlations.  South  African  Journal  of  Geology 
98, 58-67. 

Walter,  M.R.,  1983.  Archean  stromatolites:  evidence  of  Earth’s  earliest  benthos,  in: 
Schopf,  J.W.  (Ed.),  Earth’s  earliest  biosphere.  Princeton  University  Press,  Prince¬ 
ton,  NJ.  pp.  187-213. 

Walter,  M.R.,  Buick,  R.,  Dunlop,  J.S.R.,  1980.  Stromatolites  3,400-3,500  Myr  old  from 
the  North  Pole  area,  Western  Australia.  Nature  284, 443-445. 

Zang,  W.-L,  2007.  Deposition  and  deformation  of  late  Archaean  sediments  and 
preservation  of  microfossils  in  the  Harris  Greenstone  Domain,  Gawler  Craton, 
South  Australia.  Precambrian  Research  156, 107-124. 

Zundel,  M.,  Rohmer.  M.,  1985a.  Hopanoids  of  the  methylotrophic  bacteria  Methy- 
lococcus  capsulatus  and  Methylomonas  sp.  as  possible  presursors  of  C29  and  C30 
hopanoid  chemical  fossils.  FEMS  Microbiology  Letters  28, 61-64. 

Zundel,  M..  Rohmer.  M.,  1985b.  Prokaryotic  triterpenoids.  1.  3(B-Methylhopanoids 
from  Acetofcacter  sp.  and  Methylococcus  capsulatus.  Europeanjournal  of  Biochem¬ 
istry  150,  23-27. 


42 


Chapter  Three 

Microaerobic  steroid  biosynthesis  and  the  molecular  fossil  record  of 

Archean  life 

Jacob  R.  Waldbauer,  Dianne  K.  Newman  and  Roger  E.  Summons 


43 


44 
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The  power  of  molecular  oxygen  to  drive  many  crucial  biogeochemical  reactions,  from 
respiration  to  weathering,  makes  reconstructing  the  history  of  its  production  and 
accumulation  a  first-order  question  for  understanding  earth’s  evolution.  Among  the 
various  geochemical  proxies  for  the  presence  of  oxygen  in  the  environment,  molecular 
fossils  present  a  record  of  O2  where  it  was  first  produced  and  consumed  by  biology,  in  sunlit 
aquatic  habitats.  As  steroid  biosynthesis  requires  molecular  oxygen,  fossil  steranes  have 
been  key  to  inferences  about  aerobiosis  in  the  early  Precambrian,  but  better  quantitative 
constraints  on  the  oxygen  requirement  of  this  biochemistry  would  clarify  these  molecular 
fossils’  implications  for  environmental  conditions  at  the  time  of  their  production.  Here  we 
demonstrate  that  steroid  biosynthesis  is  a  microaerobic  process,  enabled  by  dissolved 
oxygen  concentrations  in  the  nanomolar  range.  We  also  present  evidence  that  microaerobic 
marine  environments  were  likely  to  have  been  widespread  and  persistent  for  long  periods  of 
time  prior  to  the  earliest  geologic  evidence  for  atmospheric  oxygen. 


1.  Geochemical  records  oe  oxidation,  oxygenation  and  evolution 


Over  the  course  of  earth’s  history,  life  has  left  one  particularly  indelible  mark:  the  surface 
of  the  planet  is  far  more  oxidized  than  it  would  be  without  biological  activity.  Life  has 
done  this  by  transferring  electrons  from  a  number  of  donors  (notably  oxygen,  sulfur  and 
iron)  to  carbon,  for  the  purpose  of  producing  the  organic  carbon  that  is  the  stuff  of 
biomass.  In  doing  so,  life  has  effected  a  net  reduction  of  carbon  and  caused  oxidized 
waste  products,  including  molecular  oxygen,  sulfate,  and  ferric  iron,  to  accumulate  in  the 
atmosphere,  oceans  and  upper  crust  to  much  higher  levels  than  would  be  present 
otherwise.  This  net  export  of  oxidizing  power  by  the  carbon  cycle  has  gone  on  more  or 
less  continuously  since  at  least  the  mid-Archean  (ca.  3.5Ga;  Ga  =  billion  years  ago),  and 
represents  a  biologically-driven,  unidirectional  change  in  the  state  of  the  earth  system. 

In  quantitative  terms,  molecular  oxygen  is  actually  the  smallest  of  these  pools  of 
accumulated  oxidants.  At  present,  the  oxygen  in  the  atmosphere  and  dissolved  in  the 
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oceans  amounts  to  only  about  10%  of  the  oxidizing  power  represented  by  sulfate,  and 
only  2-3%  of  that  represented  by  the  excess  ferric  iron  in  the  crust  over  the  proportion  in 
the  mantle  (Hayes  and  Waldbauer,  2006).  Of  course,  since  oxygen  is  the  most  powerful 
of  those  three  oxidants,  sulfate  and  ferric  iron  can  be  produced  (either  biotically  or 
abiotically)  from  sulfide  and  ferrous  iron,  using  O2  as  the  oxidant.  Hence  much  of  the 
SO4  ‘  and  Fe(III)  in  the  modern  world  also  stems  from  oxygenic  primary  production.  But 
the  relatively  small  size  of  the  molecular  oxygen  pool  highlights  the  distinction  between 
oxidation  -  the  net  transfer  of  electrons  from  inorganic  species  to  carbon  -  and 
oxygenation  -  the  accumulation  of  O2  in  the  atmosphere-ocean  system.  The  two  are 
undoubtedly  closely  linked,  but  have  not  necessarily  proceeded  apace.  In  particular, 
secular  oxidation  is  a  global  process  involving  redox  transformations  between  a  wide 
range  of  species,  where  reservoir  effects  and  geophysical  processes  come  into  play  and  a 
number  of  important  fluxes  are  poorly  constrained.  Oxygenation,  on  the  other  hand,  is 
(at  least  conceptually)  simpler:  the  abundance  of  O2  at  a  given  time  represents  the 
integration  of  a  long-term  source-sink  balance,  of  which  the  source  is  biological  and  the 
sink  is  geochemical. 

1.1  The  emergence  of  oxygenic  photosynthesis:  wildfire  or  slow  hum? 

Because  molecular  oxygen  is  a  biological  product,  its  influence  on  biogeochemistry  has  a 
start  date:  the  evolution  of  water-splitting,  oxygenic  photosynthesis.  This  biochemistry 
was  apparently  invented  exactly  once,  by  an  ancestor  of  the  cyanobacteria,  when  type  I 
and  type  II  photosystems  were  combined  with  a  unique  cofactor,  the  oxygen  evolving 
center.  The  specifics  of  how  oxygenic  photosynthesis  came  to  be  are  not  fully  clear  and 
remain  the  subject  of  active  investigation  and  debate  (Allen  and  Martin,  2007).  In 
geochemical  terms,  our  ability  to  infer  the  history  of  the  influence  of  biological  activity 
on  the  redox  state  of  the  earth’s  surface  would  be  greatly  aided  by  knowledge  of  this  start 
date. 
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The  geologic  record,  helpfully,  does  provide  some  evidence  of  a  minimum  age  for 
oxygenic  photosynthesis.  An  ensemble  of  geological  and  geochemical  evidence  for 
atmospheric  oxygenation,  which  has  come  to  be  known  as  the  “Great  Oxidation  Event”, 
points  to  the  period  ca.  2.40-2.45Ga  as  the  time  when  O2  became  a  persistent  and 
geochemically  significant  component  of  the  atmosphere,  which  it  has  remained  ever 
since.  The  suite  of  observed  geochemical  signals  and  their  prevailing  interpretations 
have  recently  been  reviewed  (Catling  and  Claire,  2005;  Buick,  2008;  Sessions  et  ah, 
2009);  suffice  it  here  to  say  that  ca.  2.45Ga  stands  as  a  minimum  age  by  which  oxygenic 
photosynthesis  must  have  arisen.  The  basic  question,  then,  is:  does  the  Great  Oxidation 
Event  (GOE)  actually  document  the  evolution  of  oxygenic  photosynthesis,  or  is  the 
process  significantly  older?  In  other  words,  was  the  spread  of  oxygenic  photoautotrophs 
a  wildfire  or  a  slow  burn? 

The  appearance  of  the  GOE  ensemble  in  the  geologic  record  could  be  taken  as  prima 
facie  evidence  of  the  dawn  of  oxygenic  photosynthesis,  an  interpretation  to  which,  with 
amendments,  some  authors  hold  (Kopp  et  ah,  2005;  Kirschvink  and  Kopp,  2008).  It  is 
important  to  keep  in  mind,  however,  the  distinction  between  the  accumulation  of  O2  in 
the  atmosphere  and  its  production  by  microorganisms  in  aquatic  environments.  The 
latter  is  necessary  but  not  sufficient  for  the  former;  the  oxygenation  of  the  atmosphere 
depends  on  the  integrated  imbalance  between  sources  and  sinks  of  oxygen.  The 
interpretation  of  the  GOE  as  marking  the  origin  of  oxygenesis  requires  that  this  balance 
have  been  upset  rapidly  by  vigorous  O2  production,  overwhelming  the  capacity  of 
existing  sinks. 

The  source  term  for  atmospheric  oxygen  in  the  early  Precambrian  is  relatively  simple:  net 
primary  production  (photosynthesis  in  excess  of  local  aerobic  respiration)  drives 
supersaturation  of  O2  in  surface  waters,  which  results  in  a  sea-to-air  flux  of  oxygen. 
(Throughout,  we  will  refer  to  oxygen  production  in  marine  environments,  since  it  is 
marine  sedimentary  sequences  that  bear  geochemical  records  of  oxygenation.  This  does 
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not,  however,  exclude  the  possibility  that  lacustrine  or  other  freshwater  environments 
contributed  to  the  oxygen  flux  -  there  is  simply  very  little  geological  record  of  such 
environments  from  the  early  Precambrian.)  The  sink  or  loss  term,  on  the  other  hand,  is 
rather  complicated:  a  wide  variety  of  geochemical  pathways  exist  for  oxygen  reduction, 
involving  many  atmospheric  and  mineral  reaction  partners.  Fortunately,  the  loss  of 
molecular  oxygen  from  the  Archean  atmosphere  has  been  the  subject  of  several  recent, 
detailed  modeling  studies  (Pavlov  and  Kasting,  2002;  Claire  et  ah,  2006;  Goldblatt  et  ah, 
2006;  Zahnle  et  ah,  2006),  which  provide  valuable  constraints  on  how  the  composition  of 
the  atmosphere  might  evolve  given  a  biogenic  O2  flux. 

The  scenario  of  rapid  spread  of  newly-evolved  oxygenic  photoautotrophs  triggering  the 
GOE  is  predicated  on  the  idea  that,  prior  to  ca.  2.45Ga,  marine  primary  producers  were 
strongly  and  globally  limited  by  the  availability  of  reductant.  If  iron-  and  sulfide-  and 
hydrogen-oxidizing  autotrophs  were  limited  only  by  supplies  of  their  respective 
reductants,  and  not  by  any  other  factor,  then  the  cyanobacteria’s  newly-developed  ability 
to  use  water  as  a  reductant  for  carbon  fixation  would  have  given  them  a  profound 
ecological  advantage.  This  assumption,  however,  is  quite  a  strong  one.  It  requires  that 
all  other  growth  substrates  (including  phosphate,  fixed  nitrogen,  trace  metals  and  light)  be 
in  plentiful  supply,  and  be  underexploited  by  anoxygenic  primary  producers,  presumably 
because  of  their  reductant  limitation.  This  notion  contradicts  the  tendency  of  ecosystems 
towards  co-limitation,  where  overabundance  of  one  substrate  relative  to  another  drives  its 
preferential  exploitation  until  multiple  forms  of  limitation  occur  simultaneously. 
Anoxygenic  autotrophs  are  not  necessarily  less  efficient  at  nutrient  or  light  harvesting 
than  cyanobacteria,  and  were  well-adapted  to  their  respective  environments. 
Cyanobacteria  had  to  ‘shoulder  aside’  and  gradually  outcompete  these  prior  inhabitants  in 
order  to  occupy  their  sunlit  habitats.  In  doing  so,  they  had  the  distinct  advantage  of  never 
facing  reductant  limitation  and  producing  a  metabolic  waste  product  that  was  deleterious 
to  their  competitors.  But  it  is  a  strong  and,  in  our  view,  unsupported  assumption  to  posit 
that  this  global-scale  process  of  ecological  succession  occurred  in  a  geologic  eyeblink. 
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The  concept  of  a  rapid  takeover  of  global  primary  production  by  oxygenic 
photoautotrophs  at  2.4Ga  is  also  at  odds  with  longer-term  geologic  records.  For  all  the 
geologic  evidence  of  the  appearance  of  atmospheric  oxygenation  at  that  time,  there  is 
relatively  little  evidence  of  large-scale  oxidation.  The  period  is  not  marked  by  unusually 
large  deposits  of  organic  carbon,  and  the  only  perturbations  to  the  isotopic  compositions 
of  sedimentary  carbon  reservoirs  come  later,  closer  to  2.1Ga,  and  are  more  plausibly 
related  to  changes  in  global  patterns  in  diagenesis  than  to  enhanced  organic  carbon  burial 
(Hayes  and  Waldbauer,  2006).  These  patterns  are  inconsistent  with  the  notion  that  there 
were  large  excesses  of  nutrients  just  waiting  to  be  unlocked  and  rapidly  consumed  by 
water-splitting  primary  producers.  The  persistence  of  deep-water  anoxia  and  coexistence 
of  oxygenic  and  anoxy genic  primary  producers  in  marine  settings  for  another  nearly  two 
billion  years  (Johnston  et  ah,  2009),  until  the  latest  Neoproterozoic,  is  further  evidence  of 
the  gradual  nature  of  the  succession  in  oceanic  primary  production. 

And  there  is  also  another  good  reason  to  doubt  that  the  GOE  indicators  immediately 
recorded  the  emergence  of  oxygenic  photosynthesis:  other  proxies  for  molecular  oxygen 
document  its  production  and  consumption  long  before  its  accumulation  in  the 
atmosphere.  The  next  section  discusses  the  distinctions  between  what  is  recorded  by  the 
various  geochemical  proxies  for  oxygenation. 

1.2  Proxies  for  oxygenation 

Most  of  what  we  know  about  the  history  of  the  oxygenation  of  the  surface  environment 
comes  from  geologic  and  geochemical  proxies  that  are  sensitive  to  the  involvement  of  O2 
in  weathering  processes  and/or  atmospheric  chemistry.  The  ensemble  of  ‘classic’  GOE 
indicators  fall  into  this  category:  oxidized  paleosols  record  oxic  chemical  weathering 
directly,  detrital  redox- sensitive  minerals  such  as  pyrite,  siderite  and  uraninite  record 
erosion  and  anoxic  fluvial  transport,  and  the  sharp  termination  of  mass-independent 
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fractionation  of  S  isotopes  that  cemented  the  GOE  paradigm  documents  a  change  in 
atmospheric  chemistry  brought  on  by  oxygenation.  A  number  of  other  oxygen  proxies 
track  the  mobility,  speciation  and/or  isotopic  fractionation  of  redox- sensitive  metals, 
including  Mo,  Re  (Anbar  et  ah,  2007),  U  (Coogan,  2009),  Cr  (Frei  et  ah,  2009)  and  Fe 
(Reinhard  et  ah,  2009),  and  also  test  the  involvement  of  oxygen  in  weathering,  transport 
and  sedimentation  processes. 

For  signals  to  be  recorded  in  these  geologic  and  geochemical  proxies,  oxygen  must  have 
already  accumulated  in  the  atmosphere  to  some  extent  and  persisted  there  for  a 
geologically-significant  period  of  time.  As  noted  in  the  previous  section,  there  are 
reasons  to  question  the  assumption  that  signals  would  necessarily  appear  in  these  proxies 
promptly  after  the  evolution  of  oxygenic  photosynthesis.  They  are  at  somewhat  of  a 
remove  -  separated  by  considerations  of  transport  and  source-sink  balances  -  from  the 
biology  of  oxygen  production.  Ideally,  we  could  find  a  direct  proxy  for  the  oxygenic 
biochemistry  itself;  unfortunately,  the  molecular  components  of  the  water-splitting 
apparatus  do  not  leave  identifiable  fossil  remains.  The  body  fossil  record  of  early 
Precambrian  microbes  is  also  very  poor,  and  even  those  microstructures  accepted  as 
biogenic  do  not  reliably  inform  taxonomic  or  physiological  identifications. 

A  remarkable  recent  finding  suggests  what  may  be  some  of  the  most  immediate  fossil 
evidence  yet  for  oxygenic  photosynthetic  microbes.  Bosak  et  al.  (2009)  examined  the 
conical  structures  built  by  some  filamentous  cyanobacteria,  which  are  plausible 
analogues  for  types  of  stromatolites  seen  in  the  Archean  rock  record.  They  observed  that 
oxygenic  photosynthesis  by  these  organisms  resulted  in  the  formation  of  oxygen  bubbles, 
which  accumulated  near  the  tips  of  the  cones.  Spherical  features  similar  to  bubbles  are 
also  seen  enclosed  by  mat  laminae  in  the  crests  of  some  late  Archean  stromatolites, 
beginning  about  2.7Ga,  and  Bosak  et  al.  (2009)  detail  why  these  features  are  likely 
signatures  of  oxygenic  photosynthesis.  If  so,  they  are  very  nearly  “fossil  oxygen”. 
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Hydrocarbon  molecular  fossils  (biomarkers)  have  provided  some  of  the  earliest  evidence 
for  the  production  and  biological  utilization  of  molecular  oxygen  (Brocks  et  ah,  1999; 
Brocks  et  ah,  2003;  Dutkiewicz  et  ah,  2006;  George  et  ah,  2007;  Eigenbrode  et  ah,  2008). 
Chapter  2  of  this  thesis  presents  a  molecular  fossil  record  from  late  Archean  sedimentary 
rocks  of  Transvaal  Supergroup,  dating  from  ca.  2.67-2.46Ga.  As  diagenetically-altered, 
but  still  identifiable  biochemicals,  biomarkers  offer  a  proxy  for  biological  activity  that 
was  once  an  actual  functional  component  of  living  cells.  A  few  specific  types  of 
molecular  fossils  are  specific  proxies  for  either  oxygenic  organisms  or  aerobic 
biochemistry.  2-methylhopanoids  are  markers  for  at  least  the  preexistence  of 
cyanobacteria,  if  not  their  direct  input  to  sedimentary  organic  matter  (Summons  et  ah, 
1999;  Sessions  et  ah,  2009).  3-methylhopanoids,  particularly  when  found  strongly 
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depleted  in  C,  are  markers  for  aerobic  Type  1  methanotrophs,  whose  preferred  habitat  at 
the  chemocline  between  oxygenic  and  methanogenic  habitats  was  likely  widespread  in 
the  late  Archean  (Blumenberg  et  ah,  2007).  This  chapter  focuses  on  the  third  class  of 
biomarker  proxies  for  oxygen,  the  steranes.  All  of  these  molecules  have  been  found  in 
sedimentary  deposits  dating  as  far  back  as  2.72Ga,  including,  as  documented  in  Chapter 
2,  in  the  samples  with  the  strongest  contamination  controls.  While  the  challenges 
involved  in  their  analysis  and  interpretation  are  considerable  (Sherman  et  ah,  2007), 
molecular  fossils  from  sedimentary  rocks  provide  a  unique  record  of  O2  where  it  was  first 
produced,  accumulated  and  utilized:  sunlit  aquatic  ecosystems. 

Fossils  of  steroids  have  been  especially  significant  as  proxies  for  oxygenation,  since 
steroid  biosynthesis  specifically  requires  molecular  oxygen.  A  summary  view  of  the 
biosynthetic  pathway  of  steroids  is  shown  in  Figure  1 .  As  indicated,  there  are  at  least 
four  steps  that  require  O2.  The  first  oxygen-dependent  step  is  also  the  first  committed 
step  in  steroid  synthesis:  the  epoxidation  of  the  isoprenoid  squalene.  The  enzyme  that 
cyclizes  oxidosqualene  to  form  the  characteristic  6,6, 6,5-  steroid  ring  structure  cannot  act 
on  squalene  itself,  but  requires  the  (35)  2,3-epoxide  (Summons  et  ah,  2006). 

Subsequently,  to  produce  the  4,14-desmethyl  sterols  that  make  up  the  vast  majority  of 
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Figure  1.  An  abbreviated  summary  of  the  biosynthesis  of  sterols  in  yeast,  showing  the 
flow  of  carbon  from  glucose  to  ergosterol.  The  first  oxygen-requiring  step  is  the 
epoxidation  of  squalene;  subsequently,  O2  is  also  required  for  the  removal  of  three 
methyl  groups  from  the  sterol  ring  system  to  produce  the  functional  membrane  lipid. 
In  these  experiments,  cells  can  either  synthesize  sterols  de  novo  (and  hence  with  a 
label)  or  take  up  unlabeled  ergosterol  from  the  medium. 


both  membrane  lipids  and  molecular  fossils,  nine  more  molecules  of  O2  are  necessary  to 
effect  three  oxidative  demethylation  reactions.  The  synthesis  of  some  steroids  (e.g., 
cholesterol)  requires  yet  more  oxygen  to  introduce  unsaturations  into  the  carbon  skeleton. 
Steroid  biosynthesis  is  highly  evolutionarily  conserved,  and  was  present  in  the  last 
common  ancestral  lineage  of  all  extant  eukaryotes  (Summons  et  ah,  2006;  Desmond  and 
Gribaldo  2009). 

Steranes  -  defunctionalized,  saturated,  diagenetically  altered  fossil  steroids  -  have  been 
detected  in  sedimentary  rocks  as  old  as  2.72Ga,  roughly  300  million  years  before  the 
GOE.  The  presence  of  steranes  in  rocks  of  this  age  implies  that  some  molecular  oxygen 
was  available  in  the  marine  environments  that  produced  the  organic  matter  preserved  in 
late  Archean  sediments.  The  implication  is  that  the  integrated  source-sink  imbalance  for 
atmospheric  oxygen  remained  very  small  for  hundreds  of  millions  of  years  after  the 
evolutionary  origin  of  oxygenic  photosynthesis,  keeping  atmospheric  O2  somewhere 
below  10'^  of  its  modern  level.  At  present,  however,  constraints  on  the  environmental 
oxygen  concentration  that  enables  steroid  biosynthesis  are  limited:  Jahnke  and  Klein 
(1983)  reported  an  apparent  oxygen  Km  (half-saturation  content)  for  squalene  epoxidase 
of  4.3pM,  and  Rogers  and  Stewart  reported  apparent  O2  Km  values  for  ergosterol  contents 
in  yeast  of  0.5pM  (Rogers  and  Stewart,  1973b)  and  “0.3pM  or  less”  (Rogers  and  Stewart, 
1973a).  The  former  is  an  in  vitro  measure  of  a  crude  preparation  of  a  single  part  of  the 
pathway,  while  the  latter  reflects  a  physiological  response  (cellular  sterol  levels)  rather 
than  biosynthesis  itself;  nor  have  these  experiments  been  repeated  in  the  years  since. 
Hence  it  is  difficult  to  assess  the  consistency  between  interpretations  of  the  biomarker 
record  and  reconstructions  of  Archean  biogeochemistry  and  atmospheric  evolution.  The 
experiments  presented  in  this  chapter  seek  to  provide  constraints  on  the  oxygen 
requirement  of  sterol  biosynthesis  that  more  directly  apply  to  the  interpretation  of  the 
molecular  fossil  record. 
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To  directly  assay  the  oxygen  requirements  of  steroid  biosynthesis,  we  adopted  an  isotopic 
labeling  strategy.  Yeast  (S.  cerevisiae)  was  chosen  as  a  test  organism  because  it  is  a 
facultatively  anaerobic  eukaryote,  able  to  grow  at  any  oxygen  concentration.  Yeast  cells 
were  grown  in  a  defined  minimal  medium  containing  C-glucose  and  unlabeled  (  C-) 
ergosterol.  Since  steroid  biosynthesis  requires  molecular  oxygen,  the  cells  are  obligately 
auxotrophic  for  steroids  under  anaerobic  conditions,  and  take  up  the  supplied  (unlabeled) 
ergosterol.  Under  aerobic  conditions,  however,  steroids  are  made  de  novo  from  carbon 
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substrates,  and  so  acquire  the  C  label  from  glucose.  By  analyzing  the  cells’  lipid 
complements  using  mass  spectrometry,  we  could  examine  the  incorporation  of  C  into 
specific  compounds,  including  sterols.  Our  goal  was  to  determine  the  lowest  dissolved 
oxygen  concentration  that  would  enable  de  novo  steroid  synthesis. 

2.  Experimental  methods 

2.1  Microaerobic  growth  experiments 

2.1.1  Culture  conditions  and  sampling 

Saccharomyces  cerevisiae  (strain  D273-10B,  ATCC)  cells  were  grown  in  a  defined 
minimal  medium  containing  4g/l  uniformly-labeled  Ce-glucose  (99%,  Cambridge 
Isotope),  supplemented  with  lOmg/1  ergosterol  (Alfa  Aesar)  and  0.5ml/l  Tween  80 
(polyoxyethylene  sorbitan  monooleate).  The  Tween  80  provided  a  source  of  unsaturated 
fatty  acids,  which  yeast  also  require  molecular  oxygen  to  produce.  The  medium  was  also 
amended  with  lml/1  FG-10  (Dow  Chemical),  a  polydimethylsiloxane-based  surfactant,  as 
an  antifoaming  agent.  Cultures  were  maintained  in  an  anaerobic  chamber  (Coy)  in  an 
atmosphere  of  5%  hydrogen,  15%  C02  and  <lppm  O2.  For  microaerobic  growth,  6ml  of 
late  log-phase  culture  (ODaoo  ~  0.8)  was  inoculated  into  300ml  of  media  in  a  bubbler 
bottle  with  the  headspace  exhausted  to  external  vacuum.  Dissolved  oxygen  concentration 
in  the  media  was  controlled  by  vigorous  bubbling  with  02:N2  mixtures  (5-5000ppm  O2, 
Airgas).  Cells  were  grown  with  bubbling  and  stirring  for  12-16h,  until  an  ODgoo  of  0.2 
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was  reached.  The  cells  were  then  harvested  by  vacuum  filtration  in  the  anaerobic 
chamber  onto  a  precombusted  GF/F  filter  (Whatman).  Filters  were  placed  into  glass 
centrifuge  tubes  filled  with  19ml  of  Bligh-Dyer  extraction  solvent  (10:5:4 
chloroform:methanol:water;  Bligh  and  Dyer,  1959)  that  had  been  pre-equilibrated  with 
the  anaerobic  atmosphere,  and  incubated  in  the  anaerobic  chamber  overnight  to  ensure 
complete  enzyme  deactivation  prior  to  oxygen  exposure. 

2.2.2  Dissolved  O2  measurements 

Oxygen  concentration  in  the  cultures  during  microaerobic  growth  was  measured  by  two 
methods:  a  colorimetric  assay  based  on  Rhodazine-D  (Chemetrics)  and  a  novel  type  of 
microelectrode  with  a  polarizable  front  guard  (Unisense).  The  Rhodazine-D  test,  based 
on  a  proprietary  redox- sensitive  chromophore,  is  convenient  and  inexpensive,  but  is 
limited  in  sensitivity  to  dissolved  oxygen  concentrations  above  approximately  300nM.  It 
is  also  susceptible  to  interferences  from  Fe(III)  and  Cu(II),  and  higher  concentrations  of 
suspended  particulates  (such  as  cells)  can  require  prefiltration,  potentially  limiting 
accuracy.  The  recently-developed  STOX  (Switchable  Trace  Oxygen)  electrode 
(Revsbech  et  al.,  2009)  allows  in  situ  electrochemical  detection  of  O2  down  to 
concentrations  as  low  as  5-lOnM  (equivalent  to  equlibrium  at  25°C  and  latm  with  an 
atmosphere  containing  4-8ppmv  O2). 

2.2  Lipid  analysis 

2.2.1  Lipid  extraction/derivitisation 

Tubes  containing  filters  and  solvent  were  removed  from  the  anaerobic  chamber  and 
ultrasonicated  for  20  minutes.  Tubes  were  then  centrifuged  for  5  minutes  at  lOOOxg,  the 
supernatant  decanted  (first  extract)  and  19ml  Bligh-Dyer  solvent  added  again.  After  a 
second  ultrasonication  and  centrifugation,  the  second  extract  was  pooled  with  the  first, 
and  10ml  water  and  10ml  chloroform  added  to  induce  phase  separation.  Phase  separation 
continued  overnight  at  -20°C,  after  which  the  upper  phase  and  interfacial  debris  were 
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removed  and  the  lower  phase  concentrated  under  N2.  Extracts  were  filtered  over  1cm  of 
silica  gel  in  pipette  columns,  eluting  lipids  with  8:2  dichloromethane: ethyl  acetate.  Lipid 
extracts  were  evaporated  to  dryness  to  determine  yield  and  Img  of  lipid  reacted  with  40pl 
BSTFA  +1%  TMCS  (Alfa  Aesar)  and  40pl  pyridine  (Acros)  at  70°C  for  60  minutes  to 
produce  trimethylsilyl  derivatives. 

2.2.2  Gas  chromatography/Mass  spectrometry 

Derivatized  lipid  extracts  were  analyzed  by  gas  chromatography-mass  spectrometry  on  a 
7980A  GC  /  5975C  MSD  system  (Agilent).  Ipl  of  extract  was  injected  onto  a  DB-1  or 
DB-5  column  (0.250mm  ID,  0.25pm  film,  30m  length;  Agilent).  Following  the  solvent 
delay,  the  oven  ramped  from  70°C  to  230°C  at  20°C/min;  carrier  helium  flow  was 
Iml/min.  Analytes  were  ionized  by  electron  impact  at  70eV. 

2.3  Modeling  of  Archean  oxygen  fluxes 

To  link  our  experimental  determination  of  the  oxygen  requirements  of  steroid 
biosynthesis  with  models  of  Archean  atmospheric  evolution,  we  constructed  a  simple 
model  for  the  O2  outgassing  flux  from  a  partially-oxygenated  surface  ocean.  Under  an 
anaerobic  atmosphere,  the  saturation  level  of  oxygen  in  the  ocean  is  very  small,  ingassing 
is  negligible,  and  the  net  sea-to-air  gas  flux  can  be  expressed  as: 

Jq^  p\02\ 

where  J02  is  the  sea-to-air  oxygen  flux,  a  is  the  proportion  of  the  surface  area  of  the  earth 
{Ae)  contributing  to  that  oxygen  flux,  Vp  is  the  globally-averaged  gas  exchange  piston 
velocity,  and  [02]a  is  the  average  oxygen  concentration  of  surface  waters  in  the  area 
represented  by  a.  Ae  is  taken  as  the  present  value,  510  million  km  .  Air-sea  gas- 
exchange,  while  strongly  dependent  on  climatic  variables  such  as  wind  speed, 
temperature  and  sea  state  that  are  poorly  constrained  for  the  Archean  ocean,  is  an 
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Figure  2.  Mass  spectra  of  unlabeled  (A)  and  '^C-labeled  (B)  trimethylsilyl  derivatives 
of  ergosterol.  Incorporation  of '^Cg-glucose  results  in  broad  peaks  in  the  mass 
spectrum,  shifted  to  higher  masses  than  the  unlabeled  compound.  Fragment  ion 
assignments  are  after  Brooks  et  al.  (1 968). 
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essentially  physical  process  likely  to  have  been  more  or  less  constant  over  time.  Here  we 
utilize  the  modern,  long-term  global  average  value  of  16cm/h  of  Glover  et  al.  (2007).  We 
note  that  Vp  is  unlikely  to  vary  by  more  than  an  order  of  magnitude,  while  the  modeled 
range  of  interest  of  J02  and  [02]a  span  roughly  four  orders  of  magnitude,  so  overall  the 
model  is  not  expected  to  be  highly  sensitive  to  the  precise  physics  of  gas  exchange. 

3.  Results:  steroid  biosynthesis  is  a  microaerobic  process 

3.1  The  oxygen  requirements  of  steroid  biosynthesis 

1  Q 

The  incorporation  of  the  C  label  from  glucose  produced  a  distinct  shift  in  the 
appearance  of  lipid  mass  spectra  (Fig.  2).  ^^C-labeled  compounds  showed  large  peaks  in 
the  mass  spectra,  shifted  up  the  mass  scale  relative  to  peaks  in  the  spectra  of  unlabeled 
compounds.  These  distinct  spectra  provide  an  unambiguous  signal  of  de  novo  lipid 
biosynthesis. 

The  carbon- 13  label  was  incorporated  into  squalene  -  the  last  steroid  synthesis 
intermediate  reachable  in  the  absence  of  O2  -  in  all  conditions  (Fig.  3).  Under  anaerobic 
conditions,  no  lanosterol  was  detected,  and  only  unlabeled  ergosterol  (taken  up  from  the 
medium)  was  present.  This  is  the  pattern  expected  if  steroid  synthesis  is  not  operative, 
and  is  entirely  consistent  with  the  oxygen  dependence  of  sterol  biosynthesis.  Once 
oxygen  was  provided  to  the  culture  by  bubbling,  however,  labeled  steroids  were  produced 
de  novo.  In  the  presence  of  oxygen,  lanosterol  is  synthesized  and  bears  the  label,  and 
labeled  ergosterol  is  also  present. 

Surprisingly,  even  the  lowest  oxygen  concentration  tested  to  date  -  estimated  at  7nM  - 
appears  to  support  steroid  biosynthesis.  This  is  a  remarkably  low  oxygen  level  to  enable 
what  has  been  thought  of  as  a  classically  aerobic  biochemistry.  It  should  be  noted, 
however,  that  this  result  has  yet  to  be  duplicated  in  a  replicate  experiment,  and  that  the 
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Figure  3.  Mass  spectra  of  lipids  from  microaerobic  growth  experiments. 
Compounds  are  indicated  across  the  top;  dissolved  oxygen  concentrations  in 
the  various  growth  experiments  are  indicated  at  left.  A "+"  indicates  that  the 
labeled  compound  was  detected,  a  "o"  indicates  that  only  the  unlabeled 
compound  was  detected;  n.d.,  not  detected. 
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oxygen  concentration  is  only  an  estimate,  based  on  the  calculated  solubility  and  the 
composition  of  the  gas  mixture  as  certified  by  the  manufacturer.  At  the  time  this 
experiment  was  performed,  we  did  not  have  access  to  the  new  STOX  electrode,  and  so 
had  no  way  of  directly  measuring  such  low  O2  concentrations.  This  experiment  will  be 
repeated  in  the  very  near  future,  with  the  dissolved  oxygen  monitored  electrochemically. 

With  those  caveats  in  mind,  we  provisionally  interpret  these  results  as  a  useful,  perhaps 
even  conservative,  lower  bound  for  the  oxygen  requirement  for  steroid  synthesis.  While 
yeast  is  certainly  an  incomplete  model  for  the  unknown  diversity  of  eukaryotic  organisms 
in  the  Archean  ocean,  the  fact  that  a  modern  organism  can  perform  this  biochemistry  at 
such  low  oxygen  concentrations  suggests  that  early  eukaryotes,  adapted  to  chronic  O2 
scarcity,  likely  could  as  well.  Notably,  steroid  biosynthesis  is  possible  well  below  the 
Pasteur  point  of  1%  PAL  (~2pM  O2),  above  which  fermentation  is  inhibited.  The  Pasteur 
point  has  sometimes  been  misinterpreted  as  a  switching  point  between  aerobic  and 
anaerobic  metabolism  (Rutten,  1970;  Goldblatt  et  ah,  2006).  Along  with  recent  results 
suggesting  microaerobic  respiration  in  E.  coli  (Stolper  et  ah,  2009),  this  experiment 
shows  that  there  is  a  geochemically-significant  range  of  dissolved  O2  concentrations 
below  the  Pasteur  point  that  enable  aerobic  biochemistry. 

The  exceptionally  low  oxygen  levels  enabling  steroid  synthesis  also  highlight  the  close 
connection  between  steroid  biochemistry  and  oxygen  sensing,  metabolism  and  defense 
(Galea  and  Brown,  2009).  Studies  of  industrial  fermentations  have  documented  that, 
during  aeration  pulses,  the  first  aerobic  biochemistry  to  become  active  is  steroid 
synthesis,  not  respiration  (Rosenfeld  et  ah,  2003).  In  our  experiment,  de  novo  steroid 
synthesis  occurred  despite  an  abundant  supply  of  exogenous  sterols  in  the  growth 
medium.  It  would  appear  that  sterol  homeostasis  is  one  of  the  most  sensitive  oxygen¬ 
detecting  systems  in  biology,  which  is  consistent  with  its  emergence  in  microaerobic 
aquatic  settings,  well  before  the  oxygenation  of  the  atmosphere. 
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3.2  Oxygenation  of  the  Archean  surface  ocean 


Our  results  demonstrate  that  steroid  biosynthesis  is  possible  in  microaerobic 
environments,  which  has  significant  implications  for  the  biogeochemical  interpretation  of 
steranes  in  the  fossil  record.  The  fossil  steranes  in  late  Archean  sedimentary  rocks  can  be 
interpreted  as  evidence  for  microaerobic  conditions  in  a  portion  of  the  marine  water 
column  in  an  otherwise  anaerobic  world.  Is  this  interpretation  of  the  molecular  fossil 
record  consistent  with  other  geochemical  evidence  for  the  evolution  of  atmospheric 
oxygen  levels?  In  particular,  could  microaerobic  regions  of  the  surface  ocean  have 
supplied  the  O2  fluxes  called  for  by  current  models  of  Precambrian  atmospheric 
evolution?  To  answer  this  question,  we  constructed  a  simple  model  of  global  air-sea  gas 
exchange  and  used  it  to  predict  the  mixed-layer  supersaturation  of  dissolved  oxygen 
implied  by  given  values  of  the  O2  flux  to  the  atmosphere  (Sec.  2.3). 

The  results  of  this  model,  shown  in  Figure  4,  indicate  that  microaerobic  ocean  surface 
environments  capable  of  supporting  steroid  biosynthesis  are  entirely  consistent  with  - 
even  required  by  -  models  of  Archean  atmospheric  evolution.  To  provide  the  oxygen 
outgassing  fluxes  called  for  in  the  models  of  Pavlov  and  Kasting  (2002)  and  Zahnle  et  al. 
(2006),  that  the  average  supersaturation  in  ocean  regions  contributing  to  sea-to-air 
oxygen  outgassing  would  be  in  the  range  of  0.1-lpM.  This  is  10-100  times  our 
experimental  determination  of  the  concentration  required  for  sterol  biosynthesis.  These 
models  show  how  such  an  oxygen  flux  can  persist  for  hundreds  of  millions  of  years 
without  causing  oxygenation  of  the  atmosphere  to  the  extent  that  would  ‘trip’  the 
geologic  and  geochemical  proxies  whose  signals  appear  at  or  near  the  GOE.  Even  if  the 
total  flux  were  an  order  of  magnitude  less  than  the  lower  limit  considered  by  Zahnle  et  al. 
(2006),  if  it  were  concentrated  in  high-productivity  hotspots  (“oxygen  oases”),  the  local 
oxygen  concentration  was  likely  to  have  been  high  enough  to  allow  steroids  to  be  made. 

It  is  important  to  recognize  that  no  threshold  value  of  sea-to-air  O2  flux  requires  that  the 
atmosphere  become  oxygenated;  so  long  as  the  oxygen  is  accompanied  by  inputs  of 
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Figure  4.  The  modeled  average  dissolved  oxygen  concentration  (i.e.,  supersaturation) 
in  oxygenated  regions  of  the  Archean  ocean,  as  a  function  of  the  sea-to-air  O2  flux  and 
the  proportion  of  the  earth's  surface  contributing  to  that  flux.  The  larger  the  outgas- 
sing  flux,  and  the  smaller  the  fraction  of  the  earth's  surface  it  comes  from,  the  more 
oxygenated  surface  waters  in  that  region  must  be.  Red  bars  indicate  the  range  of 
oxygen  fluxes  in  the  Archean  atmospheric  evolution  models  of  Pavlov  &  Kasting 
(2002)  (PK02)  and  Zahnie  et  al.  (2006)  (ZCC06).  Orange  stars  indicate  approximate 
modern  values  for  annual  net  biogenic  marine  oxygen  outgassing  and  associated 
surface-water  supersaturations  (Najjar  &  Keeling  2000). 
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sufficient  reductant,  even  the  modern  O2  flux  could  coexist  with  an  anaerobic  atmosphere 
(Claire  et  ah,  2006;  Goldblatt  et  ah,  2006;  Zahnle  et  ah,  2006). 

The  biogeochemical  distribution  of  oxygen  in  the  Archean  ocean  likely  resembled  (in 
general  terms)  that  of  biogenic  trace  gases  in  the  modern  ocean,  such  as  dimethylsulfide 
(DMS).  In  fact,  the  parallels  between  Archean  oxygen  and  modern  DMS  are  striking: 
both  have  biological  sources  in  the  upper  water  column,  vanishingly  small  atmospheric 
concentrations  and  atmospheric  lifetimes  of  hours  to  days  (Kettle  et  ah,  1999;  Kloster  et 
ah,  2007).  Oxygen  in  the  Archean  ocean  probably  had  rather  heterogeneous  geographic 
and  seasonal  distributions,  as  DMS  does  today.  Moreover,  in  both  cases,  some  degree  of 
supersaturation  of  the  surface  ocean  is  persistent  over  long  time  scales  (a  phenomenon 
also  seen  with  some  abiogenic  gases,  such  as  ^^^Rn)  and  biologically  relevant. 

The  past  O2  content  of  the  surface  ocean  cannot  be  inferred  accurately  from  atmospheric 
O2  levels  alone,  nor  should  its  concentration  be  assumed  to  have  been  homogeneous. 

This  is  especially  true  when  considering  biochemical  processes,  since,  as  steroid 
biosynthesis  exemplifies,  micromolar- scale  deviations  from  saturation  are  highly 
significant.  Hence,  the  geologic  record  of  atmospheric  O2  does  not  directly  constrain  the 
evolutionary  history  of  aerobic  biochemistries,  as  has  sometimes  been  assumed 
(Goldblatt  et  ah,  2006;  de  Duve,  2007).  While  the  oxygenation  of  the  ocean  may  seem 
trivial  compared  to  that  of  the  atmosphere  (the  ocean  contains  only  about  0.5%  of  the  O2 
in  the  modern  world),  the  ocean  has  played  a  pivotal  role  as  a  source  region  for  oxygen 
production. 

4.  A  BRIEF  PORTRAIT  OF  FIFE  &  GEOCHEMISTRY  IN  THE  LATE  ArCHEAN 

The  support  from  molecular  (e.g..  Chapter  2),  morphological  (Bosak  et  ah,  2009)  and 
newer  geochemical  (Frei  et  ah,  2009)  proxies  for  microaerobic  surface  ocean 
environments  as  far  back  as  2.7Ga  adds  new  features  to  the  picture  of  Archean 
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biogeochemistry.  A  cartoon  representation  of  a  late  Archean  coastal  ocean  environment 
and  some  of  the  biogeochemical  processes  that  went  on  there  is  shown  in  Figure  5.  A 
wide  variety  of  microbial  metabolisms  were  probably  present,  in  an  ecosystem  every  bit 
as  spatially  structured  as  those  of  the  Phanerozoic.  Microaerobic  regions  where 
cyanobacteria  were  burgeoning  primary  producers  provided  an  outgassing  flux  of  O2  to 
the  atmosphere.  While  most  gross  primary  production  was  likely  locally  aerobically 
respired  (as  it  is  today),  any  net  export  of  organic  from  microaerobic  zones  was,  in  a  low- 
Fe(III),  low-sulfate  ocean,  most  likely  subject  to  methanogenic  diagenesis.  Where  the 
methane  so  produced  met  the  fringes  of  the  oxygenated  zones,  aerobic  methanotrophy 
would  have  been  active  and  formed  another  sink  for  O2.  The  biogenic  gases  that  escaped 
the  photosynthesis-respiration-methanogenesis-methanotrophy  loop  in  the  water  column 
and  evaded  to  the  atmosphere  would  have  provided  a  stoichiometric  2: 1  flux  of 
oxygen: methane  that,  in  the  absence  of  an  ozone  shield,  would  have  undergone  rapid 
photochemical  recombination  (Claire  et  ah,  2006).  Catling  et  al.  (2001)  have  noted  that 
the  small,  steady  leak  of  methane  out  of  this  atmospheric  process  -  due  to  hydrogen 
escape  to  space  -  may  have  been  essential  to  the  eventual  oxygenation  of  the  atmosphere. 
Meanwhile,  hydrothermal  supplies  of  reductants  (including  H2  produced  by 
serpentinization  of  basaltic  crust  (Hayes  and  Waldbauer,  2006))  fueled  various 
phototrophic  and  chemotrophic  metabolisms,  both  at  the  periphery  of  microaerobic  zones 
and  distant  from  them.  A  fruitful  direction  for  future  research  will  be  to  examine  how 
and  when  shifts  from  dominantly  anoxy genic  to  dominantly  oxygenic  primary 
productivity  occurred,  a  process  that,  as  discussed  above,  was  probably  gradual  and 
spatially  heterogeneous. 

The  biodiversity  implied  by  this  range  of  biogeochemical  processes  is  considerable,  and 
implies  that  much  of  microbial  metabolism  and  that  many,  if  not  most,  of  the  major 
taxonomic  lineages  had  appeared  by  the  late  Archean.  The  molecular  fossil  record 
provides  specific  evidence  for  bacteria  and  eukaryotes  (Chapter  2),  methanogenesis 
implies  the  presence  of  archaea,  and  the  active  biogeochemical  cycles  evident  in  the 
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Figure  5.  Cartoon  representation  of  biogeochemistry  in  the  late  Archean  ocean.  In  this  depic¬ 
tion,  localized  oxygenic  photosynthesis  in  coastal  waters  drives  a  sea-air  O2  flux,  which  is 
balanced  by  a  methane  flux  deriving  from  methanogenic  diagenesis  of  aerobic  net  primary 
production.  As  a  result,  atmospheric  chemistry  with  the  same  net  reaction  as  aerobic  metha- 
notrophy  acts  to  consume  the  biogenic  gas  flux.  The  activity  of  oxygenic  and  microaerobic 
marine  microbial  communities  is  recorded  in  coastal  sedimentary  deposits,  such  as  the  Transvaal 
Supergroup  (sequence  stratigraphy  at  lower  right).  Other  indicated  processes  include:  hydro- 
thermal  inputs  of  ferrous  iron  and  molecular  hydrogen  to  deep  waters,  iron-oxidizing  photoau¬ 
totrophy  and  sedimentary  iron  reduction/cycling,  atmospheric  sulfur  photochemistry  (the 
source  of  mass-independently  fractionated  S  pools)  and  aquatic/sedimentary  S  cycling,  volcanic 
S  and  Hj  gas  fluxes,  and  hydrogen  escape  to  space. 
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geologic  record  suggest  that  life  was  widely  diversified.  It  seems  that  the  majority  of  key 
biochemical  networks,  including  energy-generating  and  biosynthetic  pathways,  were 
operative  by  the  late  Archean.  Studies  such  as  this  one  are  showing  that  some  classically 
aerobic  biochemical  pathways,  including  some  characteristic  of  eukaryotic  biology,  are 
operative  under  the  very  low  dissolved  oxygen  concentrations  that  may  have  been 
widespread  and  persistent  during  the  late  Archean. 

It  should  be  noted  that  the  production  of  sterols  does  not  strictly  entail  an  oxygenated 
water  column.  It  could  be  argued  that  the  oxygen  used  to  synthesize  sterols  in  the 
Archean  ocean  was  actually  produced  by  the  eukaryotes  themselves,  and  that  their 
presence  in  sedimentary  organic  matter  reflects  only  the  intracellular  availability  of  O2. 

A  remarkable  example  of  eukaryotes  maintaining  intracellular  O2  while  living  in  anoxic 
environments  was  recently  described  by  Esteban  et  al.  (2009),  who  found  that  certain 
freshwater  ciliates  living  just  below  an  oxic-anoxic  transition  zone  sequester  still-active 
chloroplasts  from  algal  prey  in  order  to  provide  an  oxygen  supply  for  aerobic 
biochemistry.  But  while  solely  intracellular  O2  is  a  conceivable  explanation  for  sterol 
production  in  isolation,  all  eukaryotic  photosynthesis  is  ultimately  ‘borrowed’  from 
cyanobacteria,  through  either  endosymbiosis  or  kleptoplasty  as  described  by  Esteban  et 
al.  (2009).  Hence  the  presence  of  photosynthetic  eukaryotes  of  any  sort  would  mean  that 
oxygenic  cyanobacteria  had  already  emerged,  and  thus  more  probable  that  oxygenation  of 
the  water  column  was  underway  to  some  extent. 

Accurate  reconstruction  of  the  history  of  oxygen  in  the  atmosphere  and  oceans  and 
evolution  of  the  organisms  and  biochemistry  responsible  for  its  production  are  central  to 
understanding  the  reciprocal  interactions  between  life  and  environments.  The 
demonstration  here  that  steroid  biosynthesis  is  possible  under  microaerobic  (but  not 
anaerobic)  conditions  provides  new  context  for  the  interpretation  of  the  biomarker  record 
and  inferences  about  ocean  biogeochemistry.  The  assumption  that  an  anaerobic 
atmosphere  means  an  anaerobic  ocean  is  misleading,  as  biologically-sustained  deviations 
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from  physical  equilibria  are  common  and  important  today  and  were  in  the  distant  past  as 
well.  Reconstructions  of  microbial  evolution,  often  framed  in  the  context  of  global,  long¬ 
term  trends,  should  keep  in  mind  the  heterogeneity  of  natural  environments. 
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A  multilevel  view  of  gene  expression  and  regulation  in  the  diel  cell  cycle 

of  Prochlorococcus 


Jacob  R.  Waldbauer,  Sebastien  Rodrigue,  Maureen  L.  Coleman  and  Sallie  W.  Chisholm 


The  cell  division  cycle  of  Prochlorococcus  populations  is  tightly  coupled  to  the  diel  cycle  of 
light  and  darkness,  with  chromosome  replication  in  the  late  afternoon  being  followed  by 
cellular  ilssion  after  sunset.  Experiments  in  culture  have  shown  that  the  mRNA-level 
expression  profiles  of  most  genes  also  have  a  diel  rhythm.  Here  we  quantify  the  abundance 
and  periodicity  of  the  Prochlorococcus  proteome  over  a  diel  cycle  and  compare  transcript- 
level  and  protein-level  expression  patterns.  Strong  diel  oscillations  in  transcript  abundance 
are  broadly  damped  at  the  protein  level,  and  temporal  offsets  between  the  two  suggest  that 
posttranslational  regulatory  mechanisms  are  important  in  determining  the  abundance 
dynamics  of  a  number  of  proteins.  The  overall  composition  of  the  proteome  is  quite  stable 
over  the  diel  cycle,  with  some  proteins  consistently  among  the  most  abundant  in  the  cell  at 
all  times  of  day.  The  small  changes  in  protein  abundance  that  accompany  significant 
changes  in  central  metabolic  activities  imply  that  Prochlorococcus  biochemical  networks 
may  be  poised  near  balance  points  that  allow  for  redirection  of  metabolic  fluxes  with 
relatively  small  changes  in  the  abundance  of  their  components. 


1.  Introduction:  The  importance  of  the  diel  cycle  to  life  in  the 

SUBTROPICAL  EPIPELAGIC 

The  twenty-four  hour  alternation  of  light  and  darkness  caused  by  the  earth’s  rotation  is 
one  of  the  strongest  periodicities  imposed  on  natural  systems.  Since  the  sun  is  the 
ultimate  source  of  nearly  all  energy  input  to  ecosystems,  its  diurnal  motion  has  dramatic 
consequences  for  the  activity  of  surface-dwelling  organisms  ranging  from  bacteria  to 
humans.  This  is  especially  true  for  phototrophs,  whose  physiology  is  directly  tied  to  solar 
energy  utilization.  In  the  ocean,  the  great  majority  of  photosynthesis  is  carried  out  by 
microbes,  whose  small  body  sizes  require  them  to  closely  balance  metabolic  demands  of 
growth  with  the  availability  of  light  energy.  In  the  low-latitude  epipelagic  where 
Prochlorococcus  is  a  dominant  primary  producer,  the  diel  cycle  is  one  of  the  main  time- 
varying  properties  of  the  ecosystem. 


73 


As  the  most  abundant  photosynthetic  organism  on  earth  and  the  base  of  the  food  web  in 
vast  areas  of  the  open  ocean,  Prochlorococcus  is  a  central  player  in  marine 
biogeochemistry.  Shortly  after  its  discovery  and  initial  characterization  in  the  late  1980s, 
a  remarkable  property  of  Prochlorococcus  populations  was  observed:  their  growth  is 
clearly  synchronized  to  the  die!  cycle.  Fortuitously,  the  tool  used  to  discover  and  identify 
Prochlorococcus  in  the  ocean,  flow  cytometry,  is  also  ideal  for  assessing  cell  cycling,  and 
utilization  of  DNA-binding  fluorescent  dyes  allowed  the  cell  cycle  progression  of  natural 
communities  to  be  tracked  quantitatively.  A  well-conserved  pattern  emerged:  cells 
replicate  their  DNA  in  the  afternoon  and  divide  from  around  sunset  into  the  evening 
(Vaulot  et  ah,  1995).  With  generally  one  cell  division  per  day  (though  see  Shalapyonok 
et  ah,  1998),  this  synchronization  results  in  well-defined  B-,  C-  and  D-phases  equivalent 
to  the  Gi,  S  and  G2-1-M  phases,  respectively,  of  eukaryotic  cells  (Wang  and  Levin,  2009). 

The  diel  cell  cycle  of  Prochlorococcus  has  one  additional  characteristic  that  has  enabled 
its  detailed  investigation:  it  can  be  reproduced  in  laboratory  culture.  Principal  cell  cycle 
parameters  of  axenic  cultures  maintained  on  a  light-dark  cycle,  including  growth  rate  and 
the  length  of  DNA  synthesis  and  cell  division  phases,  are  similar  to  those  observed  in 
field  populations  -  suggesting  that  the  biology  occurring  in  the  laboratory  is  at  least  a 
first-order  representation  of  what  occurs  in  nature.  A  number  of  studies  have  explored 
aspects  of  the  Prochlorococcus  diel  cell  cycle  using  a  variety  of  experimental 
arrangements,  and  have  added  insights  into  changes  in  photophysiology  (Bruyant  et  ah, 
2005)  and  optical  properties  (Claustre  et  ah,  2002)  as  well  as  in  the  expression  of  selected 
photosynthesis-  (Garczarek  et  ah,  2001)  and  cell  cycle-related  (Holtzendorff  et  ah,  2001) 
genes. 

A  further  intriguing  aspect  of  diel  cell  cycling  in  Prochlorococcus  is  the  absence  of  the 
kind  of  true  circadian  clock  that  has  been  documented  in  other  cyanobacteria.  Typical 
cyanobacterial  circadian  clock  systems  are  comprised  of  three  proteins,  KaiA,  KaiB  and 
KaiC.  The  molecular  basis  of  the  clock  is  a  24-hour  oscillation  in  the  phosphorylation 
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state  of  KaiC,  with  KaiA  promoting  phosphorylation  and  KaiB  promoting 
dephosphorylation.  All  sequenced  Prochlorococcus  strains,  however,  lack  a  homologue 
of  KaiA.  Holtzendorff  et  al.  (2008)  have  demonstrated  that  the  diel  coordination  of 
growth  and  associated  expression  oscillations  damp  rapidly  (at  least  in  terms  of  their 
population  averages)  when  cultures  are  shifted  to  continuous  light,  implying  that  the 
Prochlorococcus  timing  system  relies  on  external  cues  during  each  photoperiod. 

Recently,  Axmann  et  al.  (2009)  showed  that  the  KaiC  of  Prochlorococcus  MED4  is 
hyperphosphorylated  by  default,  apparently  enabling  the  KaiBC-only  system  to  oscillate. 
It  may  be  that  a  streamlined,  less  robust  timing  mechanism  is  sufficient  for 
Prochlorococcus  because  it  inhabits  the  low-latitude  open  ocean,  and  therefore 
experiences  smaller  seasonal  changes  in  day  length  than  phototrophs  at  higher  latitudes  or 
those  in  freshwater  settings  where  shading  by  vegetation  or  sediment  is  more  likely 
(Axmann  et  al.,  2009;  Mullineaux  and  Stanewsky,  2009). 

The  most  comprehensive  molecular  picture  of  the  Prochlorococcus  diel  cell  cycle  to  date 
has  come  from  the  work  of  Zinser  et  al.  (2009),  who  used  microarrays  to  track  gene 
expression  genome- wide  with  2-hour  resolution  over  two  consecutive  diel  periods  in 
axenic  cultures  of  Prochlorococcus  MED4.  This  study  found  that  82%  of  transcripts  of 
protein-coding  genes  had  a  detectable  circadian  oscillation  of  expression.  The  high 
proportion  of  transcripts  cycling  with  24  hour  periodicity  suggests  that  the  diel  cycle 
could  be  seen  as  the  ‘master  regulator’  of  Prochlorococcus  gene  expression  -  though 
how  this  regulation  is  effected  is  unclear.  The  MED4  genome  encodes  only  a  small 
number  of  the  complement  of  regulatory  systems  typically  found  in  bacteria  (Mary  and 
Vaulot,  2003;  Vogel  et  al.,  2003;  Kielbasa  et  al.,  2007).  Small  RNAs,  including  antisense 
transcripts,  likely  play  important  regulatory  roles  (Steglich  et  al.,  2008;  Richter  et  al., 
2010). 

In  light  of  these  previous  findings,  the  experiment  described  here  was  designed  to  address 
two  key  aspects  of  the  diel  cell  cycle  in  Prochlorococcus.  One  is  the  regulatory 
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underpinnings  of  the  observed  transcriptional  program.  The  small  complement  of 
regulatory  systems  in  MED4  makes  a  genome- wide  analysis  of  the  dynamics  of  the 
transcriptional  regulatory  network  feasible.  Using  a  combination  of  5’-  and  3’-RACE 
(Rapid  Amplification  of  cDNA  Ends)  and  ChIP-Seq  (Chromatin  Immunoprecipitation 
and  Sequencing)  techniques,  the  binding  sites  and  DNA  association  patterns  of 
transcription  factors  can  be  mapped  over  the  diel  cycle.  This  work  is  in  progress  and  will 
be  presented  elsewhere  (Rodrigue  et  ah,  in  prep.).  This  chapter  presents  the  second  focus 
of  the  diel  growth  experiment:  the  relationship  between  gene  expression  at  the  transcript 
and  protein  levels. 

The  central  dogma  of  molecular  biology  holds  that  gene  sequence  information  in  cells 
generally  flows  from  DNA  to  RNA  to  protein.  While  this  specifies  the  topology  of  the 
chemical  network  involved  in  gene  expression,  it  leaves  open  the  question  of  the  relative 
abundances  of  gene  products.  It  is  clear  that  the  relative  abundance  of  a  gene  at  the  RNA 
level  is  not  readily  predictable  from  its  abundance  at  the  DNA  level:  two  genes  each 
present  at  a  single  copy  in  the  genome  can  be  transcribed  at  very  different  rates.  The 
same  is  true  for  the  translation  of  RNA  into  protein.  Transcript  abundance,  while 
undoubtedly  a  major  factor  influencing  protein  level,  does  not  fully  determine,  predict  or 
explain  it.  The  observed  abundance  of  a  gene  product  reflects  the  dynamic  balance 
between  production  (by  transcription  or  translation)  and  degradation,  hence  the 
motivation  to  complement  the  mRNA-level  picture  of  diel  cyclic  gene  expression  in 
Prochlorococcus  with  a  protein-level  one. 

In  biogeochemical  terms,  to  understand  how  organisms  effect  molecular  transformations 
in  the  environment,  protein  quantification  is  essential.  Proteins  are  the  catalysts  for  many 
biogeochemically-significant  reactions,  and  themselves  represent  a  large  proportion  of 
the  nutrient  requirements  of  microbial  cells.  Earge-scale  gene  expression  measurements 
are  coming  to  the  fore  as  assays  of  the  activities  and  physiological  states  of  microbial 
communities,  including  investigations  of  the  diel  cycle  in  marine  surface  waters 
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(Poretsky  et  al.,  2009).  If  protein  and  transcript  levels  diverge,  however,  the  mRNA-level 
view  of  biological  activity  may  be  misleading.  It  is  therefore  increasingly  important  to 
quantify  the  relationships  between  gene  products  at  multiple  biological  levels,  so  that 
accurate  inferences  about  community  function  can  be  drawn  from  biochemical 
measurements. 

We  also  sought  to  address  the  relationship  between  the  proportional  cellular  abundances 
of  mRNAs  and  proteins.  Previous  microarray  analyses  (Zinser  et  al.,  2009)  have 
accurately  resolved  relative  changes  in  mRNA-level  gene  expression  (that  is,  changes  in 
the  abundance  of  a  given  transcript  over  time),  but,  due  to  widely  varying  hybridization 
efficiencies  and  the  potential  for  saturation  of  the  microarray  signal  (Levicky  and 
Horgan,  2005),  it  has  been  difficult  to  compare  the  abundance  of  different  gene  products 
at  a  given  time.  The  use  of  parallel  RNA-sequencing  and  mass  spectrometry  to  measure 
transcript  and  protein  abundance,  respectively,  allow  comparison  of  abundances  between 
gene  products.  Hence  we  obtain  a  first  view  of  the  rank- abundance  structure  of 
Prochlorococcus  gene  products  at  the  transcript  and  protein  levels  and  how  it  changes 
over  the  diel  cell  cycle.  The  relationships  between  transcript  and  protein  abundance  are 
an  important  component  of  systems-level  understanding  of  cellular  functions  and  the 
biogeochemistry  they  participate  in. 

2.  Experimental  methods  &  technical  discussion 

2.1  Diel  growth  experiment 

2.1.1  Culture  conditions 

Axenic  Prochlorococcus  MED4  was  grown  in  duplicate  batches  in  SOL  acid-cleaned 
polycarbonate  carboys  (Nalgene)  in  a  modified  I-66LL  illuminated  incubator  (Percival 
Scientific),  depicted  in  Figure  lA.  The  illumination  in  the  incubator  was  controlled  by 
custom  PID-controlled  dimmer  circuitry  and  programmed  to  match  a  diel  irradiance 
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time 


Figure  1.  A  Photograph  of  Prochlorococcus  MED4  batch  cultures  in  sunbox  incubator.  B  Programmed 
and  measured  irradiance  curves  for  the  sunbox.  Programmed  curve  derived  from  data  from  the 
Hawaii  Ocean  Time-series  (HOT)  station  ALOHA. 
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curve  from  the  Hawaii  Ocean  Time-series  station  ALOHA,  as  shown  in  Figure  IB. 
Temperature  in  the  incubator  was  maintained  at  24°C.  Each  culture  was  stirred 
continuously  by  a  large  teflon-coated  magnetic  stir  bar.  Prior  to  inoculation  of  the  large- 
volume  batch  cultures  for  this  experiment,  the  culture  had  been  maintained  for  several 
months  in  the  same  incubator  to  ensure  synchronization  to  the  light-dark  cycle.  The 
culture  medium  was  a  modified  version  of  Pro99  (Moore  et  ah,  2007)  based  on  Vineyard 
Sound  seawater  (collected  at  Woods  Hole,  MA)  and  pH-buffered  with  lOmM  HEPES 
and  6mM  sodium  bicarbonate.  Starting  culture  volume  for  the  diel  growth  experiment 
was  20E.  Separately,  Prochlorococcus  MED4  labeled  with  were  prepared  by 
growing  cells  from  the  same  axenic  strain  used  in  the  diel  growth  experiment  on  Pro99 
medium  in  continuous  light,  where  the  ammonium  chloride  that  is  the  sole  fixed  N  source 
in  the  medium  was  >99%  ^^NH4C1  (Cambridge  Isotope).  A  IE  culture  was  grown  to  late- 
exponential  phase  and  harvested  by  centrifugation  as  described  below,  with  samples 
preserved  for  flow  cytometry  to  provide  accurate  cell  counts  of  the  pellets. 

2.1.2  Sampling 

During  the  diel  growth  experiment,  samples  were  taken  every  two  hours  over  a  26  hour 
span,  beginning  at  local  midnight,  resulting  in  14  total  timepoints.  At  each  timepoint, 
500ml  samples  of  each  culture  were  withdrawn  using  spigots  mounted  at  the  bottom  of 
the  carboys  and  dispensed  in  to  2x250ml  centrifuge  bottles  (Nalgene).  Eor  sampling 
during  dark  and  low-light  periods,  a  low-power  green  lamp  was  used  to  provide  indirect 
work  light.  Additionally,  five  1ml  samples  of  the  culture  were  preserved  with  0.125% 
glutaraldehyde  (Tousimis)  in  cryovials  (Nunc)  for  flow  cytometric  determination  of  cell 
density  and  growth  cycling.  Earge-volume  samples  were  centrifuged  at  16,000xg  for  10 
minutes  in  an  Avanti  J-10  centrifuge  (Beckman  Coulter).  After  pipetting  off  the 
supernatant,  the  two  pellets  from  each  culture  were  resuspended,  combined  in  a  15ml 
conical  tube  (VWR),  and  the  volume  brought  to  5ml  with  Pro99  media.  Duplicate  lOpl 
aliquots  of  the  resuspended  concentrate  were  diluted  1:100  with  0.125%  glutaraldehyde 
in  Pro99  in  for  flow  cytometric  analysis  to  ensure  precise  and  accurate  determination  of 
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cell  counts  in  the  sample  pellets.  The  remainder  of  the  concentrate  was  then  split  into 
2x2ml  and  4x0. 25ml  aliquots  in  2ml  microcentrifuge  tubes  (Sarstedt),  the  former  for 
RNA-sequencing  and  the  latter  for  proteomics.  These  aliquots  were  then  centrifuged  for 
6  minutes  at  14,000xg  in  a  5418  centrifuge  (Eppendorf).  The  supernatant  was  then 
removed  and  preserved  with  5  pi  25%  glutaraldehyde  in  cryovials  for  flow  cytometric 
counting  to  assess  pelleting  efficiency.  Cell  pellets  and  flow  cytometry  samples  were 
kept  frozen  at  -80°C  until  analysis. 

2.1.3  Flow  cytometry 

Cell  counts  and  cell  cycle  were  analyzed  using  an  InFlux  flow  cytometer  (Cytopeia). 
Glutaraldehyde-fixed  samples  were  diluted  in  filtered  sterile  seawater  to  appropriate 
concentrations.  Light  scatter  and  fluorescence  signals  were  detected  using  a  488nm 
excitation  beam,  triggering  on  forward  (small-angle)  light  scatter  and  counting  cells  on 
the  basis  of  scatter  characteristics  and  chlorophyll  fluorescence.  Cell  cycle  analysis  was 
performed  using  SYBR  Green  (Invitrogen)  to  stain  DNA.  Flow  cytometry  data  were 
analyzed  using  Flow  Jo  (TreeStar). 

2.2  Proteomics  sample  preparation 

2.2.1  Protein  extraction 

Prochlorococcus  cell  pellets  were  resuspended  in  2x  LDS  buffer  (Invitrogen)  with  lOmM 
dithiothreitol  (DTT),  vortexed,  and  incubated  in  a  heat  block  at  95°C  for  20  minutes. 

They  were  subsequently  vortexed  again  and  incubated  at  37°C  for  60  minutes.  After 
cooling  to  room  temperature,  iodoacetamide  (0.5M  in  lOOmM  NH4HCO3)  was  added  to  a 
concentration  of  45mM  and  incubated  at  room  temperature  in  the  dark  for  60  minutes  to 
alkylate  cysteine  residues.  DTT  (2M)  was  then  added  to  a  concentration  of  45mM 
(assumed  to  be  lOmM  after  quenching  of  residual  iodoacetamide).  Volumes  of 
extraction  buffer  and  reduction/alkylation  reagents  were  chosen  to  result  in  a  final 
concentration  of  1x10^  extracted  cells/50pl.  Extract  from  sample  pellets  was  then  mixed 
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1 : 1  by  cell  numbers  (established  previously  by  flow  cytometry)  with  extract  from  an 
identically  and  simultaneously  processed  ^^N-labeled  cell  pellet. 


2.2.2  SDS-PAGE 

Extract  from  up  to  1x10^  total  cells  was  loaded  per  lane  into  10-lane,  lOcmx  1.5mm  10% 
SDS-PAGE  gels  (NuPAGE,  Invitrogen)  with  MOPS  running  buffer  (Invitrogen).  Gels 
were  run  at  50mA  per  gel  with  water  cooling  to  25°C  in  a  vertical  format  (SE260, 
Hoefer)  for  90  minutes.  Benchmark  protein  ladder  (Invitrogen)  was  used  as  a  molecular 
weight  standard.  After  electrophoresis,  gels  were  removed  from  their  cases  and  fixed  in 
50%  ethanol/7.5%  acetic  acid  overnight  on  an  orbital  shaker.  They  were  then  rehydrated 
in  Mihi-Q  water  for  30  minutes  before  staining  with  SimplyBlue  coomassie  stain 
(Invitrogen).  Gels  were  imaged  on  a  flatbed  scanner  prior  to  slicing  with  a  razor  blade 
into  ~lmm  cubes.  Eight  separate  molecular  weight  fractions  were  prepared  for  each 
timepoint.  The  fastest-running  component  of  the  extract  (the  bottom-most  band,  rich  in 
chlorophyll  as  evidenced  by  its  green  color  during  electrophoresis)  was  excised  and  not 
included  in  further  processing.  The  gel  slices  were  transferred  to  1.5ml  microcentrifuge 
tubes  (Axygen)  and  destained  by  shaking  for  4  hours  in  50%  ethanol/7.5%  acetic  acid  at 
37°C.  The  destain  solution  was  then  changed  and  the  gel  pieces  returned  to  the 
incubator/shaker  overnight. 

2.2.3  Trypsin  digestion 

The  destained  gel  pieces  were  washed  five  times:  twice  alternating  acetonitrile/5 OmM 
ammonium  bicarbonate  for  20  minutes  each,  then  a  final  acetonitrile  wash  until  fully 
dehydrated.  The  acetonitrile  was  then  removed  and  the  dry  pieces  chilled  on  ice. 
Sequencing  grade  modified  porcine  trypsin  (Promega)  was  diluted  in  10%  acetonitrile  in 
50mM  ammonium  bicarbonate  and  chilled  on  ice.  Sufficient  trypsin  solution  was  added 
to  the  sample  tubes  to  cover  the  gel  pieces  (~100-200pl/tube),  and  the  incubation 
continued  on  ice  until  the  gel  pieces  were  clear.  The  tubes  were  then  incubated  for  24 
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hours  at  37°C,  at  which  point  200|j,l  of  10%  acetonitrile  in  50mM  ammonium  bicarbonate 
was  added  and  incubated  for  a  further  24  hours  at  37°C. 

2.2.4  Peptide  extraction 

After  incubation  with  trypsin  for  48  hours,  the  liquid  from  the  digestion  tubes  was 
transferred  to  a  new  set  of  1.5ml  microcentrifuge  tubes,  which  were  kept  at  4°C.  200pl  of 
10%  acetonitrile  in  50mM  ammonium  bicarbonate  was  added  to  the  pieces  and  incubated 
for  1  hour  at  37°C  before  being  pooled  with  the  initial  extract.  This  process  was  repeated 
once  more  with  10%  acetonitrile  in  50mM  ammonium  bicarbonate  and  then  twice  with 
50%  acetonitrile  in  0.1%  formic  acid;  total  extract  volume  was  ~lml.  Extracts  were 
frozen  solid  at  -80°C  before  being  concentrated  in  a  vacuum  centrifuge  for  4-5  hours  at 
30°C.  The  dried  extracts  were  then  resuspended  in  500pl  of  5%  acetonitrile  in  0.25% 
formic  acid,  vortexed  vigorously,  and  centrifuged  for  10  minutes  at  14,000xg  to  pellet 
residual  gel  fragements  and  debris.  The  supernatant  was  removed,  frozen  solid  at  -80°C 
and  finally  concentrated  by  vacuum  centrifuge  for  ~2  hours  at  30°C. 

2.3  LC-MS/MS  analysis 

2.3.1  Liquid  chromatography/nanoelectrospray  ionization 
Frozen,  dried  peptide  fractions  (8  per  timepoint)  were  resuspended  in  lOpl  of  5% 
acetonitrile  in  0.25%  formic  acid  and  transferred  to  a  3 84- well  plate  (Nunc).  6 pi  of  each 
sample  was  injected  via  qualitative  pi  pickup  with  a  MicroAS  autosampler 
(Thermo/Spark  Holland)  fitted  with  a  lOpl  sample  loop.  Mobile  phase  was  delivered  by 
a  Surveyor  LC  pump  (Thermo)  fitted  with  a  backpressure  regulator  (P-880,  Upchurch) 
and  a  fixed-T  flow  splitter  (ratio  -90:1).  Peptides  were  loaded  onto  a  reversed-phase 

o 

capillary  LC  column  (Hypersil  Gold  Cig,  0.18x  100mm,  3pmxl75A  particles.  Thermo). 
The  mobile  phase  system  for  the  LC  consisted  of  0.1%  formic  acid  in  water  (buffer  A) 
and  0.1%  formic  acid  in  acteonitrile  (B).  Solvents  for  LC-MS  were  HPLC  grade 
(Burdick  &  Jackson).  Peptides  were  eluted  with  a  gradient  of  5%  to  37.5%  B  over  105 
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minutes,  followed  by  a  ramp  to  100%  B  and  washing  for  25  minutes,  then  40  minutes  re¬ 
equilibration  with  5%  B.  Column  flow  rate  was  LSpl/min,  and  total  LC  cycle  time  was 
180  minutes.  The  LC  system  was  interfaced  to  the  mass  spectrometer  through  an 
TriVersa  Nanomate  nanospray  source/fraction  collector  (Advion).  Post-column  flow 
split  between  the  nanospray  chip  and  the  collector  mandrel  was  1:2,  resulting  in  a  flow  of 
~400nl/min  to  the  chip.  The  spray  chip  was  operated  at  a  voltage  of  1.6-1.8kV,  and  spray 
current  was  monitored  to  ensure  ionization  stability. 

2.3.2  Mass  spectrometry 

Mass  spectral  data  was  acquired  on  a  LTQ-FT  Ultra  (Thermo)  in  a  data-dependent 
manner.  Each  full  scan  in  the  ICR  cell  (profile  mode,  m/z  300-1600,  resolution  100,000) 
was  followed  by  4  CID  MS/MS  scans  on  selected  precursors  in  the  linear  ion  trap. 
Dynamic  exclusion  was  enabled,  with  repeat  count  set  to  2  and  exclusion  duration  30 
seconds.  Singly  charged  ions,  ions  whose  charge  state  could  not  be  assigned,  and 
common  contaminant  ions  were  excluded  from  MS/MS  precursor  selection.  The  LTQ- 
FT  ion  optics  were  tuned  on  a  mixture  of  angiotensin  (Calbiochem)  and  bradykinin 
(Genscript)  for  best  sensitivity  and  transmission  of  peptide  ions. 

2.3.3  Peptide/protein  identification 

Mass  spectral  data  acquired  by  Xcalibur  (v.  2.0,  Thermo  .RAW  format)  was  converted  to 
mzXML  format  with  ReAdW  (v.  4.3.1).  Peptide- spectrum  matching  was  done  against  a 
database  consisting  of  the  Prochlorococcus  MED4  genome  (Rocap  et  ah,  2003),  its 
reversed  complement,  and  a  set  of  common  contaminant  proteins  including  porcine 
trypsin  and  human  keratins.  Three  MS/MS  database  search  algoriths  were  employed: 
X!Tandem  (v.  TORNADO,  Craig  and  Beavis,  2004;  with  the  k-score  plugin,  MacLean  et 
ah,  2006),  MyriMatch  (v.  060509,  Tabb  et  ah,  2007)  and  OMSSA  (v.  2.1.7,  Geer  et  ah, 
2004).  For  all  search  engines,  semi-tryptic  searches  were  conducted  with  two  missed 
cleavages  allowed.  Amino  acid  modfications  included  static  carbamidomethylated 
cysteine,  variably  oxidized  methionine,  and  variable  formation  of  pyro-glutamine,  - 
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glutamate  or  -carbamidomethylcysteine  on  the  N-termini  of  peptides.  For  OMSSA 
searching,  a  concatenated  DTA  file  (.odta  format)  was  first  produced  from  each  mzXML 
file  using  mzXML2Search,  and  spectra  were  then  searched  on  the  Darwin  computing 
cluster  at  MIT.  Output  from  the  3  search  engines  was  converted  (if  necessary)  to 
pepXML  format  and  a  Perl  script  used  to  unify  their  formats.  Each  pepXML  file  was 
then  processed  with  PeptideProphet  (Keller  et  ah,  2002)  to  assign  probabilities  to  each 
peptide  identification;  peptides  longer  than  6  amino  acids  were  considered,  and  accurate 
mass  information  was  used.  The  peptide  identifications  in  the  24  pepXML  files  from 
each  timepoint  (8  gel  slices  x  3  search  engines)  were  merged  using  iProphet  (Shteynberg 
et  ah,  in  prep;  all  models  enabled).  A  Perl  script  was  used  to  reindex  the  merged 
pepXML  file,  to  ensure  that  each  peptide  ID  had  a  unique  index  value.  Peptide 
identifications  were  then  assigned  to  proteins  by  ProteinProphet  (Nesvizhskii  et  ah,  2003) 
using  Occam’s  razor  logic,  and  requiring  a  minimum  PeptideProphet  score  of  0.05  to 
filter  out  the  weakest  spectrum  IDs.  Lurther  filtering  based  on  an  arbitrary 
ProteinProphet  score  cutoff  was  not  necessary  at  this  point  (except  to  exclude  those  with 
scores  of  0.00),  as  the  false  discovery  rate  (LDR)  was  already  acceptably  low:  33  decoy 
proteins  were  identified  among  1021  MED4  proteins,  for  a  nominal  dataset-wide  protein- 
identification  LDR  of  3.2%  (Supplementary  Lig.  2).  If  only  proteins  identified  at  2  or 
more  timepoints  are  considered,  the  LDR  was  1.6%  (14  decoys  among  873  MED4 
proteins).  As  discussed  below  (Sec.  2.4.1),  the  process  of  constructing  expression 
timecourses  eliminated  all  decoy  data,  resulting  in  a  protein- ID  LDR  in  the  time- series 
data  of  <0.2%. 

2.4  Protein  timecourse  quantification 

2.4.1  Isotope-labeling-based  quantification 

The  method  of  quantification  used  here  to  construct  timecourses  of  protein  abundance 
over  the  diel  sampling  period  is  based  on  the  use  of  isotopically-labeled  internal 
standards.  Metabolic  isotope  labeling,  here  using  affords  an  internal  standard  for 
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virtually  every  peptide  observed  in  the  sample.  Mixing  the  samples  taken  at  each  diel 
sampling  timepoint  with  aliquots  of  identical  ^^N-labeled  cells  at  the  earliest  stage  of 
protein  extraction  minimizes  the  effect  of  subsequent  experimental  biases  (protein 
extraction  yields,  peptide  ionization  efficiencies,  etc.)  on  the  observed  abundance  ratio. 
Accurate  cell  counts  ensure  that  the  mixing  ratio  was  near  1:1.  Analysis  of 

quantitative  proteomics  data  using  ^^N-labeling  does  present  challenges,  however,  not  the 
least  of  which  is  the  variable  mass  difference  between  sample  and  standard.  Details  of 
the  method  used  to  construct  timecourses  of  protein  abundance  over  the  diel  cycle  is 
described  below. 

MS^  peaks  corresponding  to  ^^N-labeled  versions  of  identified  MED4  peptides  were 
matched  to  their  unlabeled,  coeluting  partners  by  ASAPRatio  (Li  et  ah,  2003),  and  the 
abundance  ratios  of  the  versus  ^"^N-peaks  were  computed.  A  window  of  0.05  m/z 
was  used  for  integration,  background  correction  was  set  to  zero,  multiple  charge  states 
were  used  for  quantification,  and  peakgroup  pairs  were  constrained  to  the  same  elution 
time  range.  ASAPRatio  estimates  an  integrated  intensity  error  for  each  peak  by  the 
difference  between  the  integrated  raw  intensity  and  the  integrated  area  under  a  fitted  peak 
generated  by  a  Savitzky-Golay  smoothing  filter;  this  error  was  used  to  calculate  a 
coefficient  of  variation  (CV)  for  each  individual  peakgroup  ratio.  The  set  of  quantified 
peakgroup  ratios  for  the  entire  dataset  was  extracted  from  the  protXML  files  for  each 
timepoint,  and  a  data  matrix  constructed  and  subsequent  analyses  performed  in 
MATLAB  (MathWorks).  Redundant  values  for  the  same  physical  LC-MS  feature  were 
removed  by  requiring  a  5%  integrated  intensity  difference  for  duplicate,  identical  ratio 
values  for  a  given  peptide  at  a  given  timepoint.  The  total  dataset  comprised  isotope  ratios 
for  95,542  unique  peakgroup  pairs  -  here  referred  to  as  “peaks”  for  brevity,  but  actually 
representing  in  excess  of  500,000  distinct  LC-MS  features,  including  both  and  ^^N- 
partners  and  ^^C-isotopologues  for  each  peptide  across  multiple  charge  states. 
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The  set  of  unique  peak  ratios  was  then  filtered  to  remove  peaks  with  a  ratio  CV  greater 
than  41%,  corresponding  to  the  80*  percentile  in  CV  ranking.  These  high-CV  peaks  are 
generally  of  low  intensity  and/or  poor  peak  shape,  making  them  more  prone  to 
quantification  errors.  The  remaining  data  were  then  log2-transformed,  and  the  peak  ratios 
were  adjusted  to  compensate  for  slight  variations  away  from  1:1  in  the  mixing  ratio  of 
'■^N-  and  ^^N-protein  at  different  timepoints.  This  was  done  by  finding  the  median  of  the 
log2-peak  ratios  between  - 1  and  1  for  each  timepoint,  and  subtracting  that  value  from 
each  peak  ratio  at  that  timepoint.  This  normalization  procedure  ensures  that  the  peak 
ratio  distributions  for  each  timepoint  have  a  common  median  (i.e.,  0  on  a  logarithmic 
scale)  and  that  timecourse  variations  for  indivdual  proteins  are  not  due  to  systematic 
biases  in  the  overall  dataset. 

Protein-level  expression  timecourses  were  constructed  from  the  peak-level  data  in  the 
following  manner:  for  a  given  protein,  if  four  or  more  peaks  were  found  at  eight  or  more 
timepoints,  those  timepoints  were  included  in  the  timecourse.  If  a  timecourse  could  not 
be  constructed  with  those  criteria,  the  threshold  number  of  peaks  required  for  that  protein 
was  sequentially  lowered  (to  3,  2,  or  1)  until  8  or  more  timepoints  were  included.  This 
approach  was  chosen  to  maximize  the  quality  of  the  extracted  timecourses  for  data-rich 
proteins,  but  to  also  allow  timecourses  to  be  constructed  for  proteins  consistently  detected 
at  low  levels.  For  timepoints  with  at  least  4  peaks,  the  set  of  peak  ratios  was  tested  for 
outliers  using  a  variation  of  the  integrated  inconsistent  rate  (HR)  method  (Hsiao  et  ah, 
2009),  with  an  HR  cutoff  value  of  1.81. 

Protein  expression  ratios  were  calculated  from  the  peak- level  data  at  each  timepoint  by 
maximum  likelihood  estimation  of  the  parameters  of  a  lognormal  fit  to  the  data,  taking 
the  mean  of  the  lognormal  distribution  as  the  protein  ratio.  The  uncertainty  in  the  ratio 
was  taken  as  the  upper  and  lower  95%  confidence  limits  calculated  from  an  unbiased 
estimate  of  the  standard  error  of  the  mean  (Gurland  and  Tripathi,  1971).  Thus  the 
protein-level  uncertainty  is  given  by:  +(1.96cn5')/(A/*^^),  where  s  is  the  standard  deviation 
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of  the  lognormal  fit  to  the  peak- level  data,  N  is  the  number  of  (filtered)  peaks  observed 
for  a  given  protein  at  a  given  time,  and  the  factor  cn  is  taken  from  Table  2  of  Gurland  and 
Tripathi  (1971). 


Finally,  70  outlier  timepoints  (1.1%  of  the  6157  total)  were  excluded  from  protein 
expression  timecourses  by  measuring  the  ratio  difference  between  each  point  in  a 
timecourse  and  its  nearest  neighbors  in  time.  Timepoints  whose  summed  nearest- 
neighbor  distances  were  greater  than  an  empirically-determined  threshold  value  of  4.15 
times  greater  than  the  mean  value  for  that  timecourse  were  identified  as  outliers.  This 
timepoint  outlier  detection  method  does  not  assume  any  underlying  model  for  temporal 
structure  in  the  data.  The  effect  of  the  various  filtering  steps  on  dataset  size  is 
summarized  in  Table  1.  The  filtered  dataset  used  for  timecourse  construction  included 
66,186  peak  ratios,  or  69.2%  of  the  full  dataset  before  filtering.  Ultimately,  expression 
timecourses  over  the  die!  cycle  were  constructed  for  548  proteins,  which  are  shown  in 
Supplementary  Figure  1 . 


Filtering  Step 

Unique  LC-MS 
peakgroup 
pairs 

Protein-by- 

time 

datapoints 

Detected  / 
quantified 
proteins 

Proteins 
@  14 

timepoints 

Full  diel  dataset 

95542 

(147/0.15%) 

8812 

(85/0.96%) 

1021 

(33  /  3.2%) 

360 

(0/0%) 

PeakCV 

76434 

(72/0.09%) 

8181 

(43/0.53%) 

967 

(25  /  2.6%) 

307 

(0/0%) 

Peaks/timepoint 

73185 

6157 

548 

(0/<0.18%) 

170 

HR  peak  outliers 

66661 

6157 

548 

170 

Timepoint  outliers 

66186 

6087 

548 

154 

Table  1 .  Effect  of  filtering  steps  on  proteomics  dataset  size  and  false  discovery  rate 
(FDR).  Numbers  in  parentheses  indicate  the  numberof  reverse  decoy  identifications 
and  the  calculated  FDR  expressed  as  a  percentage.  No  decoy  proteins  satisfied  the 
minimum  of  8  timepoints  after  peak  CV  filtering;  protein-ID  FDR  forthe  diel  timecourses 
is  thus  estimated  at  <1/548,  or  <0.18%. 
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Table  1  also  shows  the  effect  of  data  filtering  on  the  false  discovery  rate  (FDR).  As 
noted  above,  the  FDR  for  protein  identification  in  the  full  dataset  was  3.2%  (33/1021), 
but  decoys  amounted  to  only  <1%  of  the  protein-by-time  datapoints,  and  even  less 
(0. 15%)  of  the  total  number  of  peaks.  In  the  unfiltered  dataset,  the  greatest  number  of 
timepoints  at  which  any  one  decoy  protein  was  identified  was  9,  and  the  majority  (19/33) 
were  found  at  only  one  timepoint  (Supplementary  Fig.  2).  Filtering  on  the  basis  of 
*"^N/*^N  ratio  CV  eliminated  more  than  half  of  the  decoy  peaks,  as  expected  if  most  of 
them  are  spurious  identifications  of  features  without  a  true  isotopic  partner  of  appropriate 
mass  difference.  After  peak  CV  filtering,  no  decoy  protein  was  found  at  more  than  5 
timepoints;  hence  no  decoys  passed  the  8-timepoint  requirement  for  timecourse 
construction.  Thus,  at  least  on  the  basis  of  decoy  hits,  the  estimated  protein- ID  FDR  for 
the  timecourse  data  is  bounded  at  <0.18%  (<I/548). 

2.4.2  Analysis  of  diet  expression  cycling 

Analysis  of  the  temporal  cycling  of  protein  expression  followed  an  approach  based  on 
that  outlined  by  Futschik  and  Herzel  (2008).  For  the  548  proteins  quantified  over  the  diel 
timecourse,  any  missing  timepoint  values  were  imputed  using  A:-nearest  neighbors 
(Troyanskaya  et  ah,  2001),  with  the  missing  data  inferred  as  the  weighted  average  of  the 
two  closest  timepoints  by  Euclidean  distance  in  expression  profile.  The  imputed  data 
were  used  solely  for  the  purpose  of  assessing  the  significance  of  expression  cycling.  The 
protein  timecourses  were  then  standardized  to  mean  =  0  and  standard  deviation  =  1.  Two 
tests  were  then  applied  to  identify  proteins  whose  expression  cycled  over  the  diel. 

First,  Fourier  scores  were  caluclated  for  all  timecourses  on  the  basis  of  a  24-hour  cell 
cycle,  as  in  Zinser  et  al.  (2009).  To  assess  the  significance  of  a  given  Fourier  score  value, 
a  background  distribution  of  scores  from  1000  simulated  first-order  autoregressive 
(AR(1))  timecourses  was  generated  for  each  protein,  using  autoregression  coefficients 
and  white  noise  distributions  derived  from  the  protein  timecourse.  Futschik  and  Herzel 
(2008)  have  shown  that  an  AR(1)  background  model  is  a  more  stringent  and  specific  test 
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of  cyclic  expression  than  Gaussian  or  randomized  backgrounds.  From  comparison  with 
these  background  distributions,  a  p-value  (designated  pi)  was  derived  for  each  protein  for 
the  null  hypothesis  of  no  significant  24-hour  cycling.  Second,  the  highest  autoregression 
coefficient  of  each  protein  timecourse  (allowing  a  maximum  lag  of  2  timepoints)  was 
compared  with  a  distribution  of  autoregression  coefficients  from  1000  random  white- 
noise  timecourses  of  equal  variance.  We  implemented  this  procedure  as  an 
complementary  check  for  time-coherent  signals  because  the  relative  noisiness  of  the 
proteomics  data  resulted  in  occasional  artifacts  in  the  Fourier  score  calculation.  This  test 
directly  addressed  the  autoregressive  nature  of  the  timecourse  data,  generating  a  p-value 
(designated  pa)  for  the  null  hypothesis  of  mutual  independence  of  the  timepoints. 


The  p-values  for  the  two  tests  were  combined  using  the  truncated  product  method 
(Zaykin  et  ah,  2002),  with  a  cutoff  of  pa<0.05  (no  truncation  was  applied  to  the  pf  values). 
This  cutoff  was  chosen  to  reflect  the  expectation  of  significant  autocorrelation  in 
timecourses  of  cyclic  expression;  since  the  two  tests  are  not  independent,  the  combined 
test  score  had  to  be  corrected  for  the  covariance  of  the  two  p- values  (Brown,  1975).  The 
combined  test  statistic  Xt  is  then  given  by 


X 


-2\n  Y[Pi 

_ i=/,a _ 

^  ^  Cov(-21np^  ,-21npJ 
2k 


k  =  2 


p„  <0.05 


-21np„  ,k  =  l 


Pa  ^0.05 


Combined  p-values  px  were  derived  from  the  combined  test  statistic  assuming  it  follows 
a  X  distribution  with  2k  degrees  of  freedom  when  the  joint  null  hypothesis  is  true.  A 
false  discovery  rate  approach  was  then  used  for  multiple  hypothesis  testing  with  the 
program  QVALUE  (v.  1.1,  Storey  and  Tibshirani,  2003)  in  the  R  statistical  environment 
(v.  2.10.0,  R  Development  Core  Team).  From  the  distribution  of  548  px  values,  a  q- 
value  was  computed  for  each  timecourse;  QVALUE  parameters  included  smoother 
method  for  jtq  with  5  degrees  of  freedom  and  robust  ^-value  method. 
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The  time  of  peak  abundance  for  each  protein  timecourse  was  found  by  fitting  a  series  of 
shifted  cosine  curves  to  the  standardized  expression  timecourses  (Zinser  et  ah,  2009). 

The  abundance  peak  was  identified  as  the  phase  shift  of  the  best-fit  cosine  curve  by 

minimizing  the  residual  between  the  cosine  and  the  data: 

/ 

J4  COS 

7?((^)=2  p,(o— 

t=i 

\  / 

where  t  is  the  timepoint,  Ps(t)  is  the  standardized  protein  timecourse,  Ocos  is  the  standard 
deviation  of  the  cosine  over  the  range  0  to  2jt,  and  (j)  is  the  phase  (an  integer  between  0 
and  23,  inclusive).  The  value  of  (j)  that  produced  the  minimum  value  of  R  was  considered 
the  phase  (time  of  peak  abundance)  of  the  timecourse.  This  procedure  yielded  estimates 
of  the  phasing  of  protein  expression  with  hourly  resolution. 

2.5  Quantification  of  fractional  cellular  protein  &  transcript  abundance 


'jc(t-l)  3T(|) 
.  6 


The  foregoing  analysis  (Sec.  2.4)  was  designed  to  provide  a  high-resolution  picture  of  the 
timecourse  expression  of  individual  genes  over  the  diel  cycle.  The  quantification  results 
based  on  isotope  labeling  enable  comparison  of  the  abundance  of  a  given  gene  between 
timepoints,  and  the  presence  of  a  consistent  internal  standard  (the  ^^N-labeled  peptides) 
allows  relatively  small  changes  in  expression  to  be  resolved,  even  for  proteins  that  are  not 
expressed  at  high  levels.  It  is  also  informative,  however,  to  compare  the  abundances  of 
different  proteins  to  each  other,  both  within  and  across  samples.  This  allows,  for 
instance,  one  to  see  that  protein  X  is  present  at  n  times  as  many  copies  per  cell  as  protein 
Y  in  a  given  sample.  The  approach  adopted  here  to  assess  the  cellular  abundances  of 
different  proteins  is  based  primarily  on  spectral  counting.  Spectral  counting 
quantification  stems  from  the  empirical  correlation  between  the  abundance  of  a  protein  in 
a  sample  and  number  of  times  MS-MS  fragmentation  spectra  of  peptides  of  that  protein 
are  observed  in  a  given  dataset.  The  strength  of  this  correlation  is  highly  dependent  on  a 
host  of  chemical  characteristics  of  the  protein  and  its  peptides  as  well  as  on  the 
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particulars  of  the  analytical  setup  used  to  collect  the  mass  spectral  data.  The  utility  of 
spectral  counting  as  a  mode  of  quantification  is  greatly  enhanced  by  computational 
modeling  of,  and  correction  for,  these  experimental  biases  (Mallick  et  ah,  2007).  In  this 
work,  we  used  a  computational  approach  to  spectral  counting  termed  APEX  (Lu  et  ah, 
2007;  Vogel  and  Marcotte,  2008)  for  Absolute  Protein  Expression. 

The  term  “absolute  abundance”  is  somewhat  problematic.  What  spectral-counting 
techniques  such  as  APEX  calculate  is  in  fact  the,  fraction  of  the  total  amount  of  protein 
detected  in  a  sample  comprised  by  any  one  gene  product.  Thus  it  is  still  a  form  of 
relative  quantification,  though  it  is  protein  X  relative  to  protein  Y  in  sample  A,  as 
opposed  to  protein  X  in  sample  A  relative  to  protein  X  in  sample  B.  The  latter  is  the  kind 
of  quantification  obtained  by  isotope  labeling  methods.  Spectral  counting  methods  can 
be  (and  often  are)  used  successfully  for  intersample  comparisons  of  protein  abundance, 
particularly  for  the  more  abundant  components  of  the  proteome  (Hendrickson  et  ah, 
2006).  Here  we  use  spectral  counts  for  protein-to-protein  relative  abundance 
measurements.  These  can  be  converted  to  nominally  “absolute”  units  if  the  total  amount 
of  protein  in  the  cell  is  known;  for  example,  if  the  APEX  score  of  a  protein  is  0.005, 
meaning  it  comprises  0.5%  of  all  (detected)  protein,  and  there  are  500,000  protein 
molecules  in  the  cell  (estimated  from  cellular  N  content,  amino  acid  analysis,  or 
colorimetric  assays  such  as  the  Bradford,  Eowry  or  BCA),  we  could  say  that  there  are 
2500  copies  of  that  protein  in  the  average  cell  in  that  sample.  However,  APEX  is  blind  to 
all  the  non-detected  proteins,  which  are  generally  numerous  and,  as  noted  in  Section 
3.5.1,  not  necessarily  of  low  cellular  abundance.  This  is  what  makes  spectral-count- 
based  “absolute”  quantification  qualitatively  different  from  the  use  of  synthetic, 
absolutely-quantified  peptide  standards:  the  latter  affords  a  measurement  of  the  molar 
abundance  of  one  molecular  species  irrespective  of  the  presence  of  others,  and  is  thus 
truly  absolute.  Since  APEX  has  an  inherently  limited  analytical  window,  we  chose  not  to 
convert  its  fractional  abundance  results  to  quasi-absolute,  copies-per-cell  values,  as  those 
will  inevitably  be  overestimates  of  the  true  cellular  abundance. 
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With  those  caveats  in  mind,  however,  protein-to-protein  relative  comparisons  remain 
valuable.  To  perform  this  analysis,  the  set  of  MS  spectra  at  each  of  the  14  diel 
timepoints  identified  as  belonging  to  peptides  of  Prochlorococcus  MED4  proteins  were 
analyzed  using  the  APEX  Quantitative  Proteomics  Tool  (v.  1.0.0,  Braisted  et  ah,  2008). 
The  Random  Eorest  classifier  was  trained  on  a  merged  set  of  data  from  the  06:00,  12:00, 
18:00  and  24:00  timepoints,  using  a  set  of  50  proteins  observed  in  all  14  timepoints  and 
the  same  protein  database  used  for  peptide  identification.  All  peptide  properties  were 
used  by  the  classifier,  with  a  minimum  peptide  length  of  6  amino  acids  and  2  missed 
cleavages  allowed.  After  training  and  generation  of  Oi  values,  APEX  scores  were 
generated  for  each  timepoint.  All  data  were  normalized  such  that  the  APEX  scores  for 
proteins  observed  at  a  given  timepoint  sum  to  one.  Hence  the  APEX  score  for  each 
protein  is  equivalent  to  its  fraction  of  the  total  (detected)  cellular  protein  at  a  given 
timepoint. 

We  also  compared  the  results  of  the  APEX  quantification  to  the  simple,  MS^-based, 
label-free  “absolute”  quantification  method  of  Silva  et  al.  (2006).  The  Silva  et  al.  (2006) 
method  (hereinafter  referred  to  as  “Top3”)  is  based  on  an  empirical  observation  of  the 
correlation  between  protein  abundance  and  the  average  of  the  MS^  intensities  of  the  three 
peptides  belonging  to  that  protein  giving  the  largest  MS^  peaks.  Here  we  limited  the 
application  of  the  Top3  method  to  proteins  where  10  or  more  unique  peptides  were 
detected,  as  was  the  case  for  the  results  of  Silva  et  al.  (2006).  The  Top3  abundance 
values  were  normalized  in  the  same  way  as  the  APEX  scores,  i.e.,  as  a  proportion  of  the 
total  at  a  given  timepoint.  Top3-based  abundances  were  thus  generated  for  1799  (20%) 
of  the  8812  protein-by-time  datapoints  in  the  full  diel  dataset.  A  comparison  between 
APEX  and  Top3  abundance  values  is  shown  in  Supplementary  Eigure  3.  The  correlation 
coefficient  between  the  results  of  the  two  quantification  methods  is  0.59  over  two  orders 
of  magnitude  in  predicted  abundance,  indicating  broad  agreement.  Similar  results  were 
obtained  by  Malmstrom  et  al.  (2009)  in  their  quantification  of  the  Leptospira  interrogans 
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proteome.  It  should  be  noted  that  the  predicted  abundances  of  some  proteins  vary  by 
over  50-fold  between  APEX  and  Top3  (Supplementary  Figure  3).  Combined  with  the 
biases  inherent  to  a  finite  analytical  window,  these  limitations  suggest  that  fractional 
abundance  values  are  perhaps  best  interpreted  in  terms  of  the  distribution  of  the  cellular 
proteome  among  different  gene  products,  rather  than  as  precise  measures  of  the 
abundances  of  individual  proteins. 

Finally,  while  the  comparison  between  the  transcript  and  protein  levels  of  the  cyclic 
expression  of  individual  genes  in  the  following  sections  utilizes  the  published  diel  mRNA 
timecourses  of  Zinser  et  al.  (2009),  we  sought  to  incorporate  preliminary  results  from  our 
in-progress  RNA- sequencing  transcriptomics  dataset  in  order  to  compare  the  celluar 
abundances  of  gene  products.  Since  RNA- sequencing  is,  to  first  order,  unbiased  across 
templates,  the  fractional  abundance  of  reads  in  an  RNA-Seq  dataset  belonging  to  a  given 
transcript  equals  the  proportional  abundance  of  that  transcript  in  the  sample.  To  enable 
comparison  with  the  protein  abundance  values,  we  calculated  an  equivalent  transcript 
abundance  score,  termed  AMEX  (Absolute  Message  Expression),  and  given  by: 

bp  of  sequence  from  transcript  i 

AMEX  -  length  of  transcript  i 

'  bp  of  sequence  from  protein  -  coding  regions 

total  length  of  protein  -  coding  genes 

At  the  time  of  this  writing,  RNA-Seq  results  have  been  obtained  for  only  two  timepoints, 
08:00  and  18:00.  AMEX  analysis  of  those  datasets  is  presented  in  Section  3.5. 

It  should  be  noted  that,  with  the  exceptions  of  Xcalibur  (used  for  primary  mass  spectral 
data  acquisition  on  the  ETQ-FT)  and  MATEAB,  all  software  used  in  the  analysis  of  this 
proteomics  dataset  is  freely  distributed  and  open  source. 
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3.  Results:  Protein  and  transcript  expression  over  the  Prochlorococcus 

DIEL  CELL  CYCLE 

3.1  Cell  growth  &  diel  cycling 

The  cells  were  well-entrained  to  the  diel  cycle  in  the  sunbox  incubator.  Figure  2A  shows 
the  density  of  cells  in  the  culture  over  the  period  of  diel  sampling.  Cell  density  was 
steady  throughout  most  of  the  day,  increasing  sharply  after  sunset  to  close  to  double  the 
value  at  the  start  of  the  experiment.  Cell  cycle  analysis  (Figure  2B)  confirms  that  the 
culture  was  synchronized  to  the  diel  cycle.  Cells  spent  the  hours  from  early  morning 
until  noon  in  B/Gl  phase.  DNA  synthesis  (C/S  phase)  began  in  the  afternoon  and  was 
largely  complete  by  dusk.  The  peak  of  D/G2-I-M  phase  was  at  sunset  and  cells  began  to 
divide  then,  but,  as  shown  by  the  slope  of  the  density  curve,  cytokinesis  continued  until 
midnight  or  shortly  thereafter.  The  observed  progression  and  timing  of  the  cell  cycle  is 
entirely  consistent  with  previous  observations  in  culture  and  in  the  field  (Vaulot  et  ah, 
1995;  Jacquet  et  ah,  2001),  and  further  demonstrates  the  consistency  and  reproducibility 
of  the  Prochlorococcus  diel  cell  cycle.  In  particular,  the  diel  growth  in  this  experiment  is 
very  similar  to  that  observed  by  Zinser  et  al.  (2009)  using  the  same  strain  under  similar 
(though  not  identical)  conditions,  enabling  comparison  of  results  between  the  two  studies. 
Differences  between  the  culture  conditions  of  Zinser  et  al.  (2009)  and  those  used  here 
include  the  length  of  the  light  period  (14  hours  vs.  13  hours)  and  the  maximum  midday 
light  intensity  (232  vs.  205  pmol  quanta  m'  s'  ). 

3.2  Quantitative  proteomic  analysis  of  expression  cycling 

A  total  of  1021  different  Prochlorococcus  proteins  were  observed  over  the  course  of  the 
diel  experiment  -  more  than  half  of  the  annotated  MED4  genome.  Between  524  and  764 
proteins  were  detected  at  each  sampling  timepoint,  out  of  total  of  1984  putative  open 
reading  frames  (ORFs)  in  the  current  annotation  of  the  MED4  genome  (Eigure  3A).  The 
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Figure  2.  A  Cell  density  in  the  culture  over  the  timecourse  of  the  experiment,  as  counted  by 
flow  cytometry.  Density  was  essentially  constant  between  sunrise  and  sunset,  and  increased 
after  dark  as  cells  began  cytokinesis.  Cell  density  doubled  over  the  course  of  the  experiment, 
meaning  >90%  of  cells  divided.  B  Proportion  of  cells  in  the  3  phases  of  the  cell  cycle,  measured 
by  DNA  staining  and  flow  cytometry.  DNA  replication  began  in  the  afternoon  and  was  largely 
complete  by  dusk,  when  cell  division  began.  The  times  of  experimental  sunrise  and  sunset 
(05:30  and  1 8:30,  respectively),  are  indicated  by  symbols. 
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average  coverage  (i.e.,  proportion  of  amino  acid  positions  represented  in  observed 
peptides)  of  detected  proteins  was  25.1%,  with  a  mean  of  6.8  unique  peptides  per  protein. 
Since  a  primary  goal  of  this  study  is  to  quantify  protein  expression  over  time,  consistent 
observation  of  a  set  of  proteins  across  all  (or  at  least  a  majority  of)  timepoints  is  critical. 
Figure  3B  shows  the  number  of  timepoints  at  which  each  observed  protein  was  detected. 
Of  the  1021  total  detected  proteins,  584  were  found  in  at  least  8  timepoints,  and  360  were 
detected  in  all  14  timepoints.  The  regularly-observed  proteins  actually  constitute  the 
majority  of  the  dataset,  since  they  exhibit  the  greatest  number  of  detected  peptides  and 
LC-MS  peaks.  These  consistently-observed  proteins  are  the  group  we  will  focus  on  in 
examining  expression  patterns  over  the  did.  The  proteins  observed  at  only  a  few 
timepoints  are  very  likely  genuine  observations  (having  passed  through  the  same 
identification  and  validation  process),  but  little  can  be  said  about  their  expression 
timecourses;  there  are  not,  for  example,  clear-cut  instances  of  catching  only  the  peak  of  a 
protein’s  expression  in  just  a  few  successive  timepoints.  The  great  majority  of  these 
infrequently-observed  proteins  are  detected  sporadically  over  the  did,  consistent  with 
them  being  expressed  at  relatively  low  levels  and  hovering  on  the  edge  of  the  analytical 
window. 

In  the  current  MED4  genome  annotation,  there  are  963  ORFs  for  which  no  peptides  were 
detected  at  any  of  the  did  timepoints.  Some  of  these  genes  are  essential  to  critical  cell 
functions  and  were  undoubtedly  expressed  during  this  experiment;  their  absence  reflects 
the  widely  varying  detectability  of  different  proteins  in  LC-MS-based  proteomics. 
Examples  of  such  genes  include  subunits  of  photosystems  I  and  II  and  the  cytochrome  b(f 
complex,  and  several  nucleotide  biosynthesis  enzymes.  A  majority  the  undetected 
proteins,  however,  are  short  and/or  hypothetical  OREs  that  may  or  may  not  actually 
produce  proteins.  Of  the  963  undetected  OREs,  53%  (512)  are  annotated  as 
“hypothetical”  or  “conserved  hypothetical”,  which  is  significantly  higher  than  the  25%  of 
detected  proteins  bearing  those  annotations.  Peptides  from  252  hypothetical  OREs  were 
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Figure  3.  A  Number  of  Prochlorococcus  MED4  proteins  detected  at  each  of  the  14  timepoints 
over  the  28-hour  diel  sampling  period.  B  Number  of  timepoints  at  which  each  IV1ED4  protein 
was  detected.  In  the  total  (unfiltered)  dataset,  1 021  proteins  were  detected  at  at  least  one 
timepoint,  584  at  eight  or  more  timepoints,  and  360  proteins  were  found  in  all  of  the  14 
timepoints.  No  peptides  were  detected  from  963  ORFs  in  the  current  MED4  genome  anno¬ 
tation;  these  genes  include  a  substantial  number  of  small  and/or  hypothetical  proteins. 
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detected  in  this  dataset,  however,  confirming  their  translation  into  an  actual  protein 
product. 

Using  the  criteria  and  procedures  described  in  Section  2.4,  the  data  from  the  584  proteins 
observed  in  the  majority  of  timepoints  was  filtered  to  produce  protein  expression 
timecourses  for  548  genes.  These  timecourses  are  plotted  in  Supplementary  Figure  1.  To 
assess  how  many  of  these  timecourses  showed  evidence  of  cyclic  expression  over  the 
did,  we  performed  an  analysis  (described  in  Sec.  2.4.2)  that  produced  a  ^-value  for  each 
timecourse  for  the  null  hypothesis  of  no  significant  cycling  with  24-hour  periodicity.  The 
<5r-value  is  a  measure  of  the  false  discovery  rate,  i.e.,  the  proportion  of  false  positives 
expected  if  cycling  in  a  given  timecourse  is  considered  significant.  Zinser  et  al.  (2009) 
used  a  ^-value  cutoff  of  0. 1  and  found  that  82%  of  expressed  protein-coding  genes  had 
significantly  cyclic  expression  at  the  mRNA  level.  As  shown  by  the  dotted  line  in  Figure 
4A,  applying  the  same  cutoff  to  the  did  proteomics  dataset  results  in  only  92  of  548 
timecourses  exhibiting  significant  did  periodicity.  Taken  at  face  value,  this  result  would 
imply  that  the  protein-level  expression  of  only  15%  of  MED4  genes  cycles  over  the  did. 

From  inspection  of  the  timecourses,  however,  it  is  dear  that  this  quantitative  proteomics 
data  is  substantially  sparser  than  the  microarray  data  of  Zinser  et  al.  (2009),  and  it  is 
likely  that  the  cycling  of  a  substantial  number  of  genes  is  masked  by  analytical  noise. 

The  15%  of  timecourses  at  ^<0.1,  then,  is  probably  a  minimum  estimate  for  the 
proportion  of  cycling  proteins,  and  this  number  would  increase  with  dataset  richness.  An 
alternative  approach  to  estimating  the  proportion  of  cycling  timecourses  is  to  calculate 
the  proportion  of  expected  true  positives  as  a  function  of  the  q-wolwe,  cutoff.  As  the  q- 
value  threshold  is  raised,  the  proportion  of  expected  true  positives  (i.e.,  cycling  proteins) 
reaches  a  plateau  where  <^=0.5,  since  at  that  point  each  protein  added  to  the  dataset  is 
equally  likely  to  be  cycling  as  non-cycling.  As  shown  in  Figure  4B,  this  plateau  is 
reached  at  a  value  of  roughly  43%  cycling  proteins.  So  while  specific  evidence  of 
cycling  can  be  identified  in  only  92  timecourses,  the  shape  of  the  (^r-value  distribution 
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Figure  4.  Results  of  analysis  of  diel  cycling  of  protein  timecourses.  A  Number  of  proteins  consid¬ 
ered  to  be  significantly  cycling  with  24-hour  periodicity,  as  a  function  of  g-value,  a  measure  of  the 
false  discovery  rate.  Insets  show  examples  of  protein  timecourses  with  low  (HimA,  g=0.04)  and 
high  (SqdB,  <7=0.46)  <7-values.  At  the  chosen  cutoff  of  0.1  (shown  by  the  dotted  line  and  equivalent 
to  that  used  by  Zinser  et  al.  (2009)),  92  proteins  show  significant  cycling.  Note  that  the  number  of 
cycling  proteins  is  not  very  sensitive  to  g-value  in  the  range  0.09-0.15.  B  An  estimate  of  the 
proportion  of  expected  cycling  proteins,  calculated  from  proportion  of  expected  true  positives 
(i.e.,  cycling  proteins).  As  more  timecourses  are  considered,  the  expected  proportion  of  cycling 
proteins  plateaus  near  43%,  once  the  <7-value  reaches  0.5.  So  while  clear  evidence  of  cycling  can 
be  seen  in  only  15%  of  individual  protein  timecourses,  the  shape  of  the  <7-value  distribution 
suggests  that  up  to  43%  may  have  some  diel  periodicity. 
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suggests  that  almost  half  of  proteins  could  be  cycling  -  though  it  does  not  specify  the 
members  of  that  group,  nor  the  parameters  (e.g.,  phase,  amplitude)  of  their  cycling.  This 
result,  while  hardly  robust  in  itself,  is  generally  consistent  with  an  estimate  of  the 
proportion  of  cycling  genes  derived  from  the  temporal  distribution  of  peak  protein 
abundances  (see  Sec.  3.3.3). 

It  should  be  noted  that  mass  spectrometry-based  proteomics  data  in  general  is  much 
sparser  in  terms  of  coverage  and  redundant  observations  of  gene  product  abundance  than 
microarray  or  massively  parallel  RNA- sequencing  is.  Large-scale  comparisons  of  gene 
expression  patterns  at  different  biological  levels  using  different  methodologies  must  be 
mindful  of  the  relative  richness  and  biases  of  various  datasets.  In  the  next  section,  we 
focus  on  the  relationship  between  mRNA  and  protein  expression  for  genes  where  the 
cycling  (or  lack  thereof)  is  well-resolved  at  both  levels. 

3.3  Relationships  of  mRNA  and  protein  expression  cycles 

Beyond  simply  characterizing  the  Prochlorococcus  proteome,  a  key  aim  of  this  study  is 
to  illuminate  the  dynamic  relationship  between  gene  product  abundances  at  the  transcript 
and  protein  levels.  Here  we  compare  protein  expression  timecourses  with  the  mRNA- 
level  results  of  Zinser  et  al.  (2009).  It  is  important  to  recognize  that  certain  aspects  of  the 
two  experiments  limit  how  precisely  they  can  be  intercompared.  In  particular,  the 
photoperiod  in  the  work  of  Zinser  et  al.  (2009)  was  14  hours,  an  hour  longer  than  in  this 
experiment,  and  the  sampling  points  in  the  two  studies  were  shifted  relative  to  their 
respective  dusk  and  dawn.  While  there  may  be  some  quantitative  differences  in  detail,  we 
expect  the  mRNA-protein  relationships  outlined  below  to  remain  qualitatively  the  same. 

In  the  near  future,  a  complete  diel  transcriptome  from  this  experiment  will  be  available 
(Rodrigue  et  al.,  in  prep),  allowing  direct  and  precise  comparisons  of  simultaneous 
transcript  and  protein  abundance  measurements.  Analysis  of  the  transcriptome  samples  is 
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currently  underway;  to  date,  RNA-Seq  libraries  have  been  sequenced  for  two  timepoints 
from  this  experiment,  08:00  and  18:00.  While  data  from  just  two  timepoints  is 
insufficient  to  assess  transcript  expression  cycling,  analysis  of  this  data  is  presented  in 
Sec.  3.4  in  the  context  of  cellular  gene  product  abundances. 

3.3.1  Temporal  patterns  of  transcript  &  protein  abundance 

Three  examples  of  the  types  of  mRNA-protein  cycling  relationships  observed  in  the 
Prochlorococcus  diel  cell  cycle  are  shown  in  Figure  5.  The  first  is  exemplified  by  nrdJ, 
encoding  the  ribonucleotide  reductase  that  converts  ribonucleotides  to 
deoxyribonucleotides  by  removing  the  2’ -hydroxyl  of  ribose,  which  is  an  essential  step  in 
DNA  synthesis.  This  gene  shows  what  might  be  considered  the  ‘expected’  relationship 
between  transcript  and  protein  abundance  dynamics:  the  protein  abundance  tracks  the 
transcript  abundance  quite  closely,  showing  the  same  amplitude  of  variation  and  perhaps 
a  slight  temporal  lag.  Both  mRNA  and  protein  vary  in  abundance  by  about  10-fold  over 
the  diel  cycle.  The  timing  of  peak  abundance  also  makes  intuitive  sense:  the  abundance 
of  nrdJ  gene  products  increases  through  C/S  phase,  when  cells  are  replicating  their 
chromosomes  and  thus  require  deoxyribonucleotides,  and  declines  thereafter.  nrdJ,  then, 
is  likely  an  example  of  gene  whose  expression  is  dominated  by  transcriptional  regulation. 
It  turns  out,  however,  that  this  apparently  straightforward  relationship  between  transcript 
and  protein  is  the  exception  rather  than  the  rule. 

A  more  widely  observed  pattern  is  exemplified  by  rbcL,  the  large  subunit  of  the  principal 
carbon  fixation  enzyme  Rubisco  (Figure  5B  and  E).  In  this  case,  the  abundance 
oscillation  over  the  diel  is  35  times  stronger  at  the  transcript  level  (46-fold  change)  than 
at  the  protein  level  (1.3-fold).  We  hypothesize  that,  in  this  case,  the  mRNA-level 
variability  in  expression  is  strongly  damped  by  posttranscriptional  processes. 
Nevertheless,  the  small- amplitude  diel  cycle  in  RbcL  protein  abundance  was  well- 
resolved  (Fig.  5B,  inset),  and  it  shows  a  peak  in  the  early  afternoon  -  a  time  when  rbcL 
transcript  abundance  is  declining  from  its  peak  near  sunrise.  So  in  this  case,  mRNA  and 
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protein 

mRNA  (Zinser  et  al.  2009) 


Figure  5.  Examples  of  relationships  between  expression  cycling  at  the  mRNA  and  protein  levels. 
Protein  (blue)  and  mRNA  (pink)  timecourses  for  three  genes  are  shown:  nrdJ  (ribonucleotide 
reductase),  rbcL  (Rubisco  large  subunit),  and  gyrB  (subunit  B  of  DNA  gyrase).  All  3  genes  have 
g-values  for  cycling  <0.1  for  both  mRNA  and  protein.  For  comparison,  expression  timecourses  for 
each  gene  are  shown  scaled  to  their  own  amplitude  range  (A-C)  and  on  a  common  scale  of  ±4 
logj-units  (D-F),  reflecting  the  range  of  fold  changes  in  Prochlorococcus  genes  over  the  diel  cycle 
(2®=256-fold).  Transcript  and  protein  abundances  for  nrdJare  well-correlated,  tracking  closely  in 
both  amplitude  and  phase,  and  varying  ~1 0-fold  over  the  diel.  rbcL,  on  the  other  hand,  oscillates 
much  more  strongly  at  the  mRNA  level  (46-fold)  than  at  the  protein  level  (1 .3-fold);  the  protein 
timecourse  is  shown  inset.  gyrB,  while  cycling  at  both  levels,  shows  a  small  amplitude  of  variation 
(<2-fold),  and  is  damped  and  somewhat  phase-shifted  at  the  protein  level  compared  to  the 
mRNA. 
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protein  dynamics  are  decoupled  in  both  amplitude  and  phase.  As  documented  in  Section 
3.4,  this  pattern  is  observed  repeatedly  in  central  metabolic  pathways  in  Prochlorococcus. 

The  third  example  is  less  extreme  than  the  first  two.  For  the  gene  gyrB,  which  codes  for 
a  subunit  of  the  DNA  gyrase  that  causes  negative  supercoiling  of  the  bacterial 
chromosome,  neither  transcript  nor  protein  show  more  than  2-fold  abundance  changes 
over  the  diel.  On  the  scale  of  the  nrdJ  and  rbcL  oscillations,  the  abundance  of  these  gene 
products  hardly  appears  to  change  at  all  (Figure  5F).  The  low-amplitude  cycles  of  gyrB 
and  GyrB  are  fairly  well  resolved,  however  (Figure  5D),  and  show  that  the  protein  cycle 
is  again  somewhat  damped  and  phase-shifted  relative  to  the  mRNA.  As  discussed  below, 
this  is  a  common  pattern  seen  between  transcript  and  protein  levels  over  the 
Prochlorococcus  diel  cycle. 

3.3.2  Transcript-protein  dynamics:  amplitude 

The  examples  in  the  previous  section  show  that  the  relationship  between  gene  product 
abundances  at  the  protein  and  mRNA  levels  can  be  highly  variable  from  gene  to  gene. 

For  253  genes  whose  proteins  were  observed  at  12  or  more  timepoints,  we  examined  the 
relationship  between  the  fold  change  (the  ratio  of  highest  to  lowest  abundance)  of  protein 
and  transcript  (Fig.  6).  Considering  first  just  those  with  protein  cycling  ^-values  less  than 
0.1  (the  blue  points  in  Fig.  6),  it  is  clear  that  the  great  majority  (50  of  55)  are  below  the 
1:1  line.  That  is  to  say,  for  91%  of  these  genes,  the  amplitude  of  variation  seen  in  the 
mRNA  abundance  cycle  is  diminished  at  the  protein  level.  When  all  genes  plotted  in 
Figure  6  are  considered,  the  proportion  below  the  1:1  line  drops  to  79%,  though  many  of 
those  above  the  line  are  noisier  protein  timecourses  whose  amplitudes  would  likely 
decrease  with  more  data.  This  relationship  also  highlights  how  unusual  the  case  of  nrdJ 
is:  it  supports  by  far  the  strongest  correlation  between  transcript  and  protein  levels.  rbcL, 
while  towards  an  extreme,  is  part  of  a  general  trend  of  strong-transcript,  weak-protein 
cycles.  gyrB  is  more  typical  of  the  most  densely-populated  region  of  the  amplitude 
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Figure  6.  Comparison  of  the  amplitudes  of  expression  cycling  at  the  mRNA  and  protein  levels.  If 
protein-level  expression  tracked  transcript-level  variation  quantitatively,  points  would  lie  on  or  near 
the  1:1  line  shown.  Instead,  most  genes  show  quite  different  amplitudes  of  variation  between  the 
protein  and  mRNA  levels,  with  the  majority  being  damped  at  the  protein  level  compared  to  the 
transcript.  Genes  are  colored  by  their  g-values  for  diel  cycling;  most  of  the  cases  of  protein  fold 
change  appearing  greater  than  that  of  mRNA  are  attributable  to  noisy,  non-cycling  protein 
timecourses  (gray  points  above  1 :1  line).  For  clarity,  only  genes  whose  protein  timecourses  included 
at  least  12  timepoints  (253  total)  are  shown.  Comparisons  of  the  mRNA  and  protein  timecourses  for 
genes  indicated  by  name  are  shown  in  Figure  5.  mRNA-level  data  from  Zinser  et  al.  (2009). 
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space.  Overall,  the  comparison  of  expression  fold-changes  suggests  that  mRNA-level 
variations  are  broadly  damped  at  the  protein  level. 

The  methods  used  here  to  detect  cyclic  protein  expression  (metabolic  isotope  labeling, 
Fourier  scoring  with  false  discovery  rates;  see  Sec.  2.4)  can  confidently  detect  protein 
abundance  oscillations  even  when  the  amplitude  is  less  than  2- fold  -  as  it  is  for  most 
observed  proteins  (Figure  6).  When  protein  abundances  are  tracked  over  time  with 
sufficient  temporal  resolution,  small  variations  that  would  be  impossible  to  resolve  in  2- 
sample  comparisons  become  apparent  and  interpretable.  If  we  had  simply  contrasted 
protein  abundances  at,  say,  sunrise  and  sunset  or  midday  and  midnight,  few  differences 
would  be  significant.  It  would  appear  that  the  protein  abundance  variations  that 
accompany  (and  underlie)  the  light-dark  entrained  cell  cycle  of  Prochlorococcus  are 
generally  of  a  smaller  magnitude  than  those  observed  in  typical  laboratory  gene 
expression  experiments  involving  strong,  systemic  perturbations  such  as  acute  nutrient 
starvation,  temperature  shock  or  toxin  exposure. 

3.3.3  Transcript-protein  dynamics:  phase 

Besides  the  amplitude  or  magnitude  of  abundance  changes,  the  phase  of  expression 
cycles  (i.e.,  the  timing  of  the  peaks  and  troughs  in  abundance)  can  also  vary  between  the 
mRNA  and  protein  levels.  If  transcriptional  control  were  dominant,  we  might  expect  to 
see  a  small,  more  or  less  consistent  lag  between  the  peaks  of  transcript  and  protein 
abundance,  simply  due  to  the  finite  speed  of  translation  -  much  as  observed  for  nrdJ 
(Figure  5A).  As  shown  in  Figure  7,  however,  the  protein  abundance  cycles  observed  in 
this  experiment  show  a  wide  range  of  phase  relationships  with  their  respective  transcripts. 
A  substantial  number  do  plot  slightly  above  and  to  the  left  of  the  1:1  “in  phase”  diagonal, 
which  is  where  proteins  lagging  their  transcripts  would  fall,  though  the  range  of  lags  in 
that  group  spans  1-8  hours.  This  is  many  times  longer  than  the  half-life  of 
Prochlorococcus  mRNA  molecules  (which  averages  2.4  minutes;  Steglich  et  ah,  in  prep) 
and,  even  given  the  slow  translation  rates  in  Prochlorococcus  (see  Chapter  5),  it  is 
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Figure  7.  Relationship  between  phasing  of  protein-  and  mRNA-level  expression,  based  on  data  from 
the  current  experiment  and  Zinser  et  al.  (2009),  respectively.  A  Peak  abundance  times  at  the  protein 
and  transcript  levels  for  genes  with  protein  cycling  g-values  <0.1  (n=92).  Genes  plotting  along  the 
central  blue  diagonal  (or  in  the  upper  left  or  lower  right  corners)  have  their  peaks  of  protein  and 
transcript  abundance  at  the  same  time  during  the  diel  cycle,  i.e.,  the  protein  and  mRNA  are  in  phase. 
An  example  is  sir  (ferredoxin-sulfite  reductase),  whose  expression  timecourses  are  shown  in  B.  Genes 
plotting  near  either  of  the  two  red  diagonals  have  their  protein  and  mRNA  peaks  offset  by  1 2  hours 
(i.e.,  antiphase).  C  shows  an  example  of  antiphase  expression,  chIP  (geranylgeranyl  diphosphate 
reductase). 


difficult  to  see  how  transcript  abundance  variations  would  take,  say,  6  hours  (a  quarter  of 
the  cell  cycle)  to  be  transmitted  to  the  protein  level.  Assessment  of  the  significance  of 
these  smaller  phase  offsets  will  await  the  transcriptome  timecourses  from  the  current 
experiment. 

There  are,  however,  a  number  of  genes  (e.g.,  chlP,  Fig.  1C)  for  which  the  phasing  of 
mRNA  and  protein  abundance  are  clearly  divergent  and  involve  posttranscriptional,  and 
probably  posttranslational,  processes.  It  is  conceivable  that  for  these  genes,  protein 
degradation  is  also  strongly  cyclic,  and  this  imposes  another  time-varying  driver  on 
protein  abundance.  If  protein  degradation  and  transcript  abundance  both  cycle  with 
sufficient  amplitude  and  with  different  phases,  the  observed  timecourse  of  protein 
abundance  would  represent  the  sum  of  these  two  oscillating  source  and  sink  terms.  Why 
certain  proteins  would  be  subject  to  this  mode  of  regulation  remains  a  question  for  future 
investigation. 

Zinser  et  al.  (2009)  found  that  a  majority  of  Prochlorococcus  transcripts  peak  in 
abundance  near  sunrise  and  sunset.  A  two-peaked  distribution  of  peak  abundance  times 
was  also  found  in  the  protein  timecourses  from  this  experiment  (Figure  8).  Consistent 
with  the  peaks  in  the  abundance  of  transcripts,  the  peaks  in  the  protein  phase  distribution 
follow  shortly  after  sunrise  and  sunset.  The  histograms  in  Figure  8  also  provide  an  way 
to  estimate  the  proportion  of  non-cycling  genes.  If  no  genes  were  significantly  cycling, 
the  distribution  of  phases  would  be  random,  and  the  histogram  should  look  flat,  with  the 
same  number  of  genes  ‘peaking’  at  each  timepoint.  Viewed  in  this  way,  the  sunrise  and 
sunset  peaks  sit  on  top  of  a  flat,  ‘background’  distribution  of  roughly  12  proteins  per 
timepoint.  If  those  288  (12  proteins  x  24  timepoints)  are  considered  non-cycling,  then 
the  predicted  proportion  of  cycling  proteins  is  47%,  similar  to  the  43%  estimated  from 
the  shape  of  the  ^-value  distribution,  above. 
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time  of  peak  abundance 


time  of  peak  abundance 


Figures.  Histograms  of  times  of  peak  expression  with  hourly  resolution  for  (A)  proteins  in  the 
current  experiment  and  (B)  transcripts  in  the  diel  microarray  experiment  of  Zinser  et  al.  (2009). 
Both  distributions  show  peaks  near  sunrise  and  sunset,  whose  times  during  the  two  experiments 
are  indicated  by  symbols.  Note  that  the  photoperiod  is  one  hour  longer  in  B  than  in  A.  The  white 
dotted  line  in  the  upper  panel  represents  a  hypothetical  null  distribution  of  non-cycling  proteins, 
approximately  12  pertimepoint. 
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3.4  Diel  balance  of  carbon  metabolism  in  Prochlorococcus 


One  of  the  most  important  cellular  functions  in  Prochlorococcus,  and  arguably  the  most 
important  with  regard  to  its  role  in  ocean  biogeochemistry,  is  carbon  fixation.  The 
immediate  incorporation  of  CO2  into  organic  matter  is  catalyzed  by  Rubisco,  but  this  is 
just  one  reaction  in  a  central  carbon  metabolic  network  that  involves  both  reductive  (C- 
fixing)  and  oxidative  (C-respiring)  pathways.  Two  components  of  this  metabolic 
network,  the  Calvin  cycle  and  the  pentose  phosphate  pathway,  can  be  viewed  as  a 
superpathway  of  two  intersecting  cycles  working  in  opposite  directions  (Fig.  9A).  In 
essence,  the  reductive  portion  (the  Calvin  cycle)  trades  energy  (ATP)  and  reducing  power 
(NADPH)  for  fixed  carbon,  while  the  oxidative  portion  (the  pentose  phosphate  pathway) 
trades  fixed  carbon  for  reducing  power.  During  the  day,  photosynthesis  can  replenish  the 
ATP  and  NADPH  consumed  by  the  Calvin  cycle,  allowing  net  fixation  of  carbon.  At 
night,  reserves  of  fixed  carbon  stored  as  glycogen  are  consumed  and  NADPH  is 
regenerated. 

Proper  regulation  of  the  intersection  of  these  two  cycles  is  essential.  To  illustrate  why, 
consider  what  happens  if  the  metabolic  flux  is  allowed  to  run  around  the  outside  of  the 
superpathway:  chemistry  accomplished  in  one  part  of  the  cycle  is  undone  in  another,  with 
no  net  result  except  the  waste  of  3  molecules  of  ATP.  It  is  the  balance  of  fluxes  through 
the  intersection  of  the  reductive  and  oxidative  portions  that  determines  whether  net 
carbon  fixation  or  respiration  occurs. 

The  expression  of  the  genes  coding  for  enzymes  of  central  carbon  metabolism  was 
discussed  in  detail  by  Zinser  et  al.  (2009).  They  found  that  transcripts  for  components  of 
the  reductive  and  shared  portions  of  the  network  were  most  abundant  near  sunrise,  while 
those  for  the  oxidative  section  were  highest  near  sunset.  The  amplitude  of  expression 
variation  was  particularly  strong  for  the  initial  C-fixing  enzymes  of  the  Calvin  cycle, 
Rubisco  and  phosphoglycerate  kinase. 
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Figure  9.  (Facing  page)  Central  carbon  metabolism  in  Prochlorococcus.  A  The  network  of 
central  carbon  metabolism,  showing  the  intersection  of  the  reductive  (Calvin  cycle)  and 
oxidative  (pentose  phosphate  pathway)  components.  The  Calvin  cycle  trades  reducing  power 
for  fixed  carbon,  while  the  oxidative  pentose  phosphate  pathway  does  the  reverse.  The  two 
share  a  set  of  enzymes,  shown  in  the  central  portion  of  the  diagram.  The  direction  of  metabolic 
flux  --  and  hence  whether  net  C  fixation  or  respiration  occurs  --  depends  on  regulation  of  the 
activity  of  these  shared  enzymes.  OpcA  is  a  posttranslational  regulator  of  Zwf  (see  text).  B 
Expression  of  carbon  metabolism  genes  at  the  protein  (blue)  and  transcript  (pink;  Zinser  et  al. 
2009)  levels.  Most  genes,  especially  those  of  the  Calvin  cycle,  show  strong  oscillations  in  mRNA 
abundance  that  are  damped  and  phase-shifted  at  the  protein  level.  Of  these  enzymes,  only 
transaldolase  (Tal)  changes  in  abundance  more  than  2-fold  over  the  diel  cycle.  A  is  redrawn 
after  a  figure  by  LR.  Thompson  and  used  with  kind  permission. 

compound  abbreviations: 

RuBP  ribulose-l,5-bisphosphate 
PGA  3-phosphoglyceric  acid 
BPG  1,3-bisphosphoglycerate 
GAP  glyceraldehyde  3-phosphate 
DHAP  dihydroxyacetone  phosphate 
SBP  sedoheptulose-1 ,7-bisphosphate 
S7P  sedoheptulose-7-phosphate 
X5P  xylulose-5-phosphate 

enzyme  names: 

rbcL/rbcS  ribulose-1,5-bisphosphate  carboxylase/oxygenase  (Rubisco) 
pgk  phosphoglycerate  kinase 
gap2  glyceraldehyde-3-phosphate  dehydrogenase 
tpi  triosephosphate  isomerase 

cbbA  fructose-1 ,6-bisphosphate/sedoheptulose-1 ,7-bisphosphate  aldolase 
gIpX  fructose-1 ,6-bisphosphatase/sedoheptulose-l  ,7-bisphosphatase 
tktA  transketolase 
rpiA  phosphopentose  isomerase 
rpe  phosphopentose  epimerase 
prkB  phosphoribulokinase 
tal  transaldolase 
pgi  phosphoglucose  isomerase 
zwf  glucose-6-phosphate  dehydrogenase 
pgl  6-phosphogluconolactonase 
gnd  6-phosphogluconate  dehydrogenase 


R5P  ribose-5-phosphate 

E4P  erythrose-4-phosphate 

FBP  fructose-1, 6-bisphosphate 

F6P  fructose-6-phosphate 

G6P  glucose-6-phosphate 

6PGL  6-phosphonoglucone-6-lactone 

6PG  6-phosphogluconate 

Ru5P  ribulose-5-phosphate 
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The  results  shown  in  Figure  9B  add  new  insights  to  our  understanding  of  carbon 
metabolism  and  fixation  by  Prochlorococcus.  The  large- amplitude  oscillation  of 
expression  at  the  mRNA  level,  especially  for  genes  in  the  reductive/Calvin  cycle  portion 
of  the  pathway,  is  strongly  damped  at  the  protein  level.  The  abundances  of  central 
carbon  metabolism  enzymes  show  only  small  variations  over  the  diel  cycle.  If  these 
small  changes  are  sufficient  to  alter  the  balance  of  fluxes  through  the  central  intersection 
-  as  they  apparently  are  -  this  implies  that  this  metabolic  network  is  poised  near  a  kind  of 
balancing  point.  The  many-fold  changes  in  transcript  levels  observed  by  Zinser  et  al. 
(2009)  do  not  result  in  wholesale  redistribution  of  protein  abundances  between  the 
oxidative  and  reductive  portions  of  the  cycle,  but  rather  nudge  the  network  to  one  side  or 
the  other  of  the  flux  balance  point. 

Notably,  the  only  one  of  these  central  carbon  metabolism  enzymes  to  show  more  than  2- 
fold  changes  in  abundance  is  transaldolase  (tal),  which  is  also  the  only  component  of  the 
central  intersection  to  act  solely  in  the  oxidative  direction.  These  data  suggest  that  the 
abundance  of  the  transaldolase  protein  is  a  key  parameter  regulating  the  direction  of  flux 
through  central  carbon  metabolism.  The  lower  abundance  of  transaldolase  during  the 
morning  hours  may  be  just  enough  to  constrict  flux  through  the  oxidative  pentose 
phosphate  pathway,  promoting  carbon  fixation.  The  importance  of  transaldolase  levels 
has  previously  been  inferred  from  the  presence  of  genes  encoding  transaldolase  in 
cyanophage  genomes  and  the  production  of  proteins  from  those  genes  during  phage 
infection  (L.R.  Thompson  et  al.,  in  prep). 

Zinser  et  al.  (2009)  also  raised  the  possibility  of  posttranslational  regulation  of  carbon 
metabolism.  In  particular,  the  PrkB/Gap2-binding  inhibitor  CP  12  and  the  allosteric  Zwf 
effector  OpcA  were  suggested  to  promote  flux  through  the  oxidative  pathway  at 
nighttime.  CP  12  was  not  observed  in  our  proteomics  dataset;  it  is  quite  small  (74aa)  and 
would  not  remain  bound  to  its  regulatory  targets  under  the  denaturing  conditions  used  to 
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prepare  the  samples.  OpcA,  though,  was  observed,  and  almost  exclusively  during  the 
dark  timepoints  (Figure  9A).  While  it  is  difficult  to  quantify  its  cycling  amplitude  since  it 
dropped  out  of  the  analytical  window  for  much  of  the  day,  OpcA  is  clearly  significantly 
more  abundant  at  night,  when  it  likely  promotes  flux  through  the  oxidative  portion  of  the 
pathway.  The  observed  constancy  of  the  abundances  of  enzymes  such  as  Zwf  reinforces 
the  importance  of  the  role  of  these  regulatory  proteins  in  modulating  fluxes  through 
metabolic  pathways. 

The  relatively  constant  abundance  of  Calvin  cycle  enzymes  also  helps  explain  a 
somewhat  puzzling  observation  in  the  experiment  of  Zinser  et  al.  (2009).  They  noted 
that,  at  night,  the  maximal  light-saturated  rate  of  carbon  fixation  drops  to  only  2-  to  3- 
fold  below  its  daytime  peak  (c.f.  Figure  3B  of  Zinser  et  al.,  2009).  This  is  surprising 
given  that  Rubisco  shows  a  more  than  40-fold  change  in  transcript  abundance  over  the 
diel,  and  other  enzymes  in  the  C-reduction  pathway  cycle  by  4-10  fold  at  the  mRNA  level 
(Fig.  9B).  It  is  now  clear  that  the  troughs  in  transcript  abundance  for  Calvin  cycle  genes 
at  the  sunset  do  not  result  in  low  levels  of  their  respective  proteins  during  the  night. 
Nighttime  abundances  of  Calvin  cycle  proteins  are  only  -40%  below  their  daytime  peaks; 
the  remaining  10-25%  drop  in  C  fixation  capacity  between  day  and  night  is  likely  due  to 
the  more  oxidized  state  of  the  cellular  metabolite  pool  in  the  dark  (L.R.  Thompson  et  al., 
in  prep).  Central  carbon  metabolism  in  Prochlorococcus  is  clearly  both  a 
biogeochemically  important  pathway  and  one  that  needs  to  be  understood  at  multiple 
levels  of  gene  expression  and  regulation. 

3.5  Cellular  gene  product  abundances  over  the  diel  cycle 

3.5.1  Correlation  of  fractional  transcript  and  protein  abundance 
The  use  of  the  APEX  (Absolute  Protein  Expression)  technique  (Eu  et  al.,  2007),  and  an 
equivalent  measure  of  transcript  abundance,  here  termed  AMEX  (Absolute  Message 
Expression),  allows  us  to  express  gene  product  abundances  in  fractional  terms  -  that  is,  as 
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a  percent  of  the  total  amount  of  protein  detected  at  any  one  timepoint.  The  basis  for  these 
metrics  and  the  caveats  to  their  interpretation  are  discussed  in  more  detail  in  Sec.  2.5; 
suffice  it  here  to  note  that  the  APEX/ AMEX  abundances  represent  fractions  of  detected 
gene  products  -  since  the  amount  of  detected  protein  or  mRNA  is  always  less  than  the 
total,  such  measurements  will  overestimate  true  cellular  abundances  to  some  degree. 

This  problem  is  significantly  more  acute  at  the  protein  level,  since  many  fewer  gene 
products  are  represented  in  the  proteomics  datasets  (27-39%)  than  in  the  RNA-Seq 
libraries  (89-92%).  With  these  experimental  limitations  in  mind,  however,  we  can  gain 
biological  insights  from  comparisons  of  fractional  gene  product  abundances. 

We  analyzed  the  relationship  between  fractional  abundances  at  the  protein  and  transcript 
levels,  for  the  two  timepoints  (08:00  and  18:00)  for  which  mRNA  sequencing  has  been 
performed  to  date  (Eig.  10).  The  correlation  between  the  two  is  quite  low:  mRNA 
abundance  explains  only  8-18%  of  the  variance  in  protein  abundance.  While  it  is 
perhaps  surprising  that  transcript  and  protein  abundances  should  be  so  decoupled,  similar 
results  have  been  obtained  from  protein-transcript  comparisons  in  a  variety  of  organisms, 
including  yeast  (Gygi  et  ah,  1999;  Eoss  et  ah,  2007;  Garcia-Martinez  and  Gonzalez- 
Candelas,  2007;  Tuller  et  ah,  2007;  Ingolia  et  ah,  2009),  E.  coli  (Ishihama  et  ah,  2008), 
Streptomyces  (Jayapal  et  ah,  2008),  Desulfovibrio  (Nie  et  ah,  2006),  mouse  (Huttlin  et  ah, 
2009)  and  human  (Gry  et  ah,  2009).  The  loose  connection  between  mRNA  and  protein 
abundances  is  a  theme  emerging  from  multiple  studies,  and  an  important  one  if  we  are  to 
quantify  molecular  interactions  between  microbes  and  their  environments. 

It  should  also  be  noted  that  the  genes  whose  protein  products  were  not  detected  were  not 
necessarily  rare  at  the  transcript  level.  The  gray  points  along  the  x-axes  in  Eigure  10 
show  the  fractional  abundances  of  transcripts  for  which  no  peptides  were  detected,  some 
of  which  rank  among  the  most  abundant  transcripts  in  the  cell  at  a  given  timepoint. 

While  some  of  these  are  transcripts  of  hypothetical  OREs  that  may  actually  not  be 
translated,  some  of  them  are  genes  that  encode  important  cellular  functions  and  their 
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Figure  10.  Correlation  between  mRNA  and  protein  abundances  (expressed  as  fraction  of  the 
cellular  total,  based  on  AMEX/APEX  scores)  at  (A)  08:00  and  (B)  1 8:00.  Blue  points  are  genes  detected 
as  both  transcript  and  protein,  while  gray  points  along  the  x-axes  are  genes  detected  only  as  mRNA. 
The  correlation  between  abundance  at  the  mRNA  level  and  abundance  at  the  protein  level,  while 
positive,  is  not  strong:  it  explains  only  8-1 8%  of  the  variance  that  spans  >4  orders  of  magnitude. 
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absence  is  due  to  the  limited  size  and  experimental  biases  of  the  proteomic  dataset.  The 
possibility  that  a  number  of  relatively  abundant  protein  products  are  outside  the  analytical 
window  reinforces  the  approximate  nature  of  fractional  abundance  measurements. 

3.5.2  Stability  of  proteome  composition  over  the  diet  cycle 

One  very  useful  question  that  gene  product  fractional  abundance  measurements  allow  us 
to  address  is  to  what  extent  cellular  resources  are  redistributed  among  different  parts  of 
metabolism  over  the  diel  cycle.  Are,  for  example,  photosynthesis  proteins  a  major 
portion  of  the  cellular  proteome  during  the  day,  but  then  degraded  and  hardly  present  in 
the  cell  at  night?  Does  the  diel  cycle  involve  large-scale  sifts  of  amino  acids  among 
proteins  involved  in  different  cellular  processes?  As  documented  in  Section  3.4,  enzymes 
of  the  Calvin  cycle  are  still  present  in  the  cell  at  night,  at  more  than  half  of  their  daytime 
abundance,  even  though  Prochlorococcus  respires  carbon  at  night  rather  than  fixing  it.  A 
proteome- wide  view  of  fractional  abundance  changes  over  the  diel  is  shown  in  Figure 
1 1  A,  as  the  correlations  between  abundance  profiles  at  three  timepoints:  06:00  (just  after 
sunrise),  08:00,  and  18:00  (just  before  sunset).  If  a  large  proportion  of  the  proteome  were 
being  redistributed  among  different  sets  of  proteins  over  time,  we  would  expect  the 
contrast  between  sunrise  and  sunset  to  be  significantly  greater  than  that  between 
successive  timepoints.  This  is  not  borne  out  by  the  observations:  the  fractional 
abundance  of  proteins  at  06:00  correlates  equally  well  with  abundances  at  08:00  as  with 
those  at  18:00.  The  correlation  between  06:00  and  18:00  is  actually  slightly  higher  than 
that  between  06:00  and  08:00,  due  primarily  to  closer  correspondence  in  a  few  high- 
abundance  proteins.  There  does  not  appear  to  be  evidence  of  major  remodeling  of  the 
proteome  over  the  diel  cycle. 

A  yet  broader  picture  of  proteome  stability  over  the  diel  is  revealed  by  the  matrix  of 
pairwise  correlations  between  fractional  abundances  at  all  timepoints  (Fig.  IIB).  Each 
set  of  APEX  values  was  compared  to  every  other,  with  the  aim  of  gauging  whether 
protein  abundances  correlate  better  between  nearby  timepoints  than  between  distant  ones. 
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Figure  11.  Stability  of  the  MED4  proteome  over  the  diel  cycle.  A  Correlation  of  fractional  protein 
abundances  (APEX  values)  across  timepoints.  Protein  abundance  values  at  06:00  (just  after 
sunrise)  are  compared  with  values  two  and  twelve  hours  later.  If  the  rank-abundance  structure  of 
the  proteome  varied  strongly  over  the  diel,  we  would  expect  protein  abundances  to  correlate 
better  between  nearby  timepoints  (e.g.,  06:00  and  08:00)  than  between  widely  separated  ones 
(e.g.,  06:00  and  1 8:00).  However,  this  is  not  observed;  in  this  case,  the  correlation  between 
timepoints  separated  by  1 2  hours  is  actually  slightly  better  than  that  between  successive  samples. 
B  Matrix  of  Spearman's  rank  correlation  coefficient  (p)  values  for  protein  abundances  between  all 
diel  timepoints.  If  proteome  abundance  structure  changed  with  time,  the  p  values  would  be 
expected  to  decrease  moving  away  from  the  diagonal,  presumably  to  a  minimum  at  an  offset  of  1 2 
hours.  Instead,  no  significant  difference  is  seen  in  p  values  between  the  smallest  (2  hours)  and 
greatest  (1 2  hours)  temporal  offsets.  The  two  correlations  plotted  in  A  are  shown  in  bold. 
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Spearman’s  rank  correlation  (p;  simply  the  standard  Pearson’s  r  calculated  on  ranks)  was 
used  here  to  avoid  weighting  the  highest- abundance  proteins  too  heavily  and  maintain  a 
proteome-wide  measure  of  correspondence.  Using  p,  the  two  correlations  shown  in 
Figure  1 1 A  give  the  same  coefficient.  This  matrix  of  correlations  allows  us  to  assess  the 
similarity  in  proteome  composition  as  a  function  of  temporal  offsets  between  timepoints. 
If  proteome  composition  varied  systematically  over  the  diel  cycle,  the  p  values  should 
decrease  away  from  the  main  diagonal  to  a  minimum  where  the  offset  is  12  hours.  In 
fact,  however,  the  p  values  at  12-hour  offsets  are  not  significantly  lower  than  those  at  2- 
hour  offsets  (one-tailed  Mest,  p=0.19).  This  data  strongly  suggests  that  the  overall 
composition  of  the  Prochlorococcus  proteome  remains  stable  over  the  diel  cycle. 

3.5.3  Cellular  functions  of  abundant  proteins 

The  stability  of  proteome  composition  means  that  a  particular  set  of  proteins  are 
consistently  the  most  abundant  in  the  cell.  One  enumeration  of  this  set  is  listed  in  Table 
2,  where  all  5 1  proteins  with  average  fractional  abundance  over  the  diel  cycle  of  0.005  or 
greater  are  grouped  by  function.  To  illustrate  the  consistency  of  protein  abundance, 
consider  that  these  51  proteins  occupy  92%  (656/714)  of  the  spots  in  the  combined  top-51 
abundance  rankings  of  the  14  timepoints.  The  biochemical  functions  of  many  of  the 
abundant  proteins  are  ones  expected  from  Prochlorococcus:  carbon  fixation, 
photosynthesis,  ATP  synthesis  and  nutrient  uptake.  Necessary  cellular  functions  such  as 
protein  folding  (chaperonins),  nucleotide  metabolism,  fatty  acid  biosynthesis,  cell 
division  (FtsZ)  and  translation  (TufA)  are  also  represented.  Notably,  the  most  abundant 
central  carbon  metabolism  proteins  are  predominantly  those  involved  in  the  Calvin  cycle; 
none  of  the  oxidative  pentose  phosphate  pathway-specific  proteins  appear  (the  most 
abundant  of  these,  transaldolase,  has  an  average  APEX  score  of  0.0024). 

A  few  biochemical  systems  appear  in  the  list  of  abundant  proteins  that  have  not  to  date 
been  the  subject  of  detailed  investigation  in  Prochlorococcus.  Two  components  of  the 
thioredoxin  system,  TrxA  and  AhpC,  are  highly  abundant.  Thioredoxin  (TrxA)  is  a 
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Table  2.  The  51  most  abundant  proteins  in  IV1ED4,  grouped  by  function.  Each  of 
these  proteins  has  an  average  fractional  abundance  over  the  diel  cycle  of  >0.005 
(i.e.,  >0.5%  of  all  detected  protein). 


redox- active  protein  that  interacts  with  a  wide  range  of  enzymes  involved  in 
photosynthesis  (Schurmann  and  Buchanan,  2008).  Thioredoxin  peroxidase  (AhpC)  is  an 
inorganic  and  alkylperoxide  detoxifying  enzyme  (Parsonage  et  ah,  2008)  that  uses 
thioredoxin  as  an  electron  donor;  since  Prochlorococcus  lacks  genes  for  catalase  and 
heme  peroxidases,  it  likely  relies  on  this  thioredoxin-based  system  to  deal  with  peroxides 
that  are  an  inevitable  byproduct  of  its  oxygenic  metabolism.  This  may  be  especially  true 
in  axenic  culture  conditions  (such  as  the  current  experiment),  as  it  has  been  shown  that 
one  of  the  key  “helper”  roles  of  heterotrophic  bacteria  in  co-culture  with 
Prochlorococcus  is  as  peroxide  scavengers  (Morris  et  ah,  2008).  Nickel-containing 
superoxide  dismutase  was  also  one  of  the  most  abundant  proteins,  consistent  with  the 

o 

notion  that  these  dense  (~10  cells/ml),  axenic  cultures  of  Prochlorococcus  are 
oxidatively  stressful  environments  for  the  cells. 

Another  surprise  is  the  abundance  of  two  enzymes,  CysKl  and  MetlV,  involved  in  the 
biosynthesis  of  the  sulfur-containing  amino  acids,  cysteine  and  methionine.  These  two 
enzymes  catalyze  the  same  step  in  their  respective  pathways:  the  insertion  of  sulfide 
(from  assimilatory  sulfate  reduction)  into  O-acetyl(homo) serine.  This  is  unexpected 
because  Cys  and  Met  are  two  of  the  least-common  amino  acids  in  the  MED4  proteome 
(20‘^-  and  17‘^-most  abundant,  respectively;  see  Chapter  5).  Moreover,  conditions  known 
to  promote  high  expression  of  these  genes,  such  as  sulfur  limitation  (Wirtz  et  ah,  2004), 
seem  unlikely  in  a  sulfate-rich  medium  like  seawater.  The  high  abundance  of  these 
proteins  is  therefore  puzzling.  It  may  be  that  the  MED4  alleles  of  these  two  enzymes  are 
catalytically  slow,  necessitating  high  abundances  even  to  provide  the  small  required 
supply  of  Cys  and  Met.  It  is  also  possible  that  Prochlorococcus  is  especially  sensitive  to 
sulfide,  and  the  abundance  of  these  enzymes  ensures  that  H2S  produced  by  assimilatory 
sulfate  reduction  is  quickly  utilized  for  its  intended  biosynthetic  purpose  and  does  not 
build  up  in  the  cytosol. 
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Figure  12.  Rank-abundance  structure  of  the  IV1ED4  proteome.  Plotted  is  the  proportion  of 
cellular  protein  represented  by  the  n  most  abundant  proteins,  based  on  APEX  scores. 

Each  of  the  14  diel  sampling  timepoints  is  represented  by  a  gray  curve,  and  the  average 
across  the  diel  dataset  is  in  blue.  For  comparison,  the  mRNA  abundance  curves  for  the  two 
timepoints  sequenced  to  date  are  shown  in  pink.  The  shape  of  the  cumulative  protein 
abundance  curve  shows  relatively  little  variation  across  the  diel,  suggesting  that  amino 
acids  are  not  more  evenly  distributed  among  proteins  at  one  time  of  day  than  another. 

The  inset  lists  the  number  of  proteins  accounting  for  the  given  percentages  of  the  cellular 
proteome  (e.g.,  on  average,  75%  of  protein  is  in  1 20  different  gene  products).  With  significant 
caveats  concerning  experimental  sensitivity  and  bias,  this  suggests  that  a  few  hundred  proteins 
account  for  the  bulk  of  the  proteome,  while  the  rest  (>1000  in  IV1ED4)  are  quite  rare  or  non- 
expressed. 
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3.5.4  Rank-abundance  structure  of  the  proteome 

Finally,  fractional  abundance  measurements  can  provide  an  estimate  of  how  concentrated 
a  cell’s  total  complement  of  proteins  are  in  the  most  abundant  gene  products.  Figure  12 
shows  the  rank-cumulative  abundance  structure  for  the  MED4  proteome  as  measured 
over  the  diel  cycle.  The  5 1  most  abundant  proteins  discussed  in  the  previous  section 
make  up,  on  average,  56%  of  the  total  amount  of  detected  protein.  The  shapes  of  the 
rank- abundance  curves  for  the  various  timepoints  (gray  lines)  do  not  vary  dramatically, 
further  evidence  that  the  overall  composition  of  the  proteome  is  generally  stable  over  the 
diel  cycle.  99%  of  the  detected  protein  is  accounted  for  by  roughly  572  proteins;  taken  at 
face  value,  this  implies  that  the  products  of  all  other  gene  (a  total  of  1356  ORFs)  sum  to 
<1%  of  cellular  protein.  As  discussed  above,  however,  fractional  abundance  estimates 
are  blind  to  undetected  proteins,  which  are  not  necessarily  rare.  By  comparison,  the 
higher-coverage  transcriptomics  datasets  (pink  lines  in  Figure  12)  suggest  that  the  top 
600  gene  products  make  up  -90%  of  the  total.  While  the  estimates  are  uncertain  at  this 
point,  these  data  do  point  towards  a  “long  tail”  of  gene  product  abundance,  with  over 
1000  gene  products  accounting  for  perhaps  -10-20%  of  the  cellular  total.  Similar  “long 
tails”  have  been  observed  in  other  biological  systems,  including  human  plasma 
(Anderson  and  Anderson,  2002).  These  distributions  illustrate  the  enormous  expansion 
in  the  dynamic  range  of  gene  product  abundance  between  DNA  and  protein. 

4.  Conclusions 

A  principal  finding  of  this  study  is  that  the  strong  diel  periodicity  in  the  expression  of 
many  Prochlorococcus  genes  at  the  mRNA  level  (Zinser  et  ah,  2009)  is  substantially 
damped  at  the  protein  level.  While  Zinser  et  al.  (2009)  found  that  82%  of  transcripts  of 
protein-coding  genes  exhibit  a  diel  abundance  cycle,  we  estimate  that  only  15-47%  of 
proteins  cycle  significantly  over  the  diel.  While  mass-spectrometry  based  proteomics 
data  is  not  as  rich  as  microarrays  or  RNA- sequencing  as  a  probe  of  gene  expression,  it  is 
clear  that,  for  80-90%  of  genes,  the  amplitude  of  oscillation  in  gene  product  abundance  is 
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lower  at  the  protein  level  than  at  the  mRNA  level.  Additionally,  for  many  proteins  the 
timing  of  peak  abundance  is  quite  different  from  -  or  even  opposite  to  -  the  maximum  in 
transcript  abundance,  and  these  offsets  are  highly  variable  from  gene  to  gene.  Hence  the 
temporal  dynamics  of  protein  expression  cannot  be  directly  extrapolated  from  transcript 
abundance  timecourses.  Our  results  show  that  substantial  decoupling  of  transcript  and 
protein  abundances  occurs  even  in  a  small,  ‘simple’  organism  like  Prochlorococcus  that 
has  streamlined  metabolism  and  regulatory  systems. 

We  have  also  established  that  the  overall  composition  of  the  proteome,  in  terms  of 
proportional  protein  abundance,  is  quite  stable  over  the  die!  cycle.  Proteins  that  are 
highly  abundant  in  the  cell  remain  so  around  the  clock,  not  being  degraded  down  to  low 
levels  and  then  re-synthesized.  Thus  in  proteomic  terms,  Prochlorococcus  is  not  a 
“phototroph  by  day,  heterotroph  by  night”;  the  protein  machinery  for  its  photoautotrophic 
lifestyle  is  present  in  the  cells  in  the  dark,  and  the  enzymes  used  for  respiration  at  night 
are  present  during  the  day  as  well.  The  fact  that  Prochlorococcus  actually  does  perform 
net  C  fixation  during  the  day  and  net  respiration  at  night  suggests  that  its  metabolic 
networks  are  poised  near  a  flux  balance  point.  Relatively  small  changes  in  the  abundance 
of  enzymes  at  key  points  in  the  network,  likely  combined  with  other  forms  of 
posttranslational  regulation,  can  redirect  metabolic  fluxes  to  match  cellular  demands  and 
environmental  conditions.  This  picture  of  metabolism  reflects  the  evolutionary 
refinement  of  microbial  physiology,  whereby  fluxes  through  a  complex  biochemical 
network  are  modulated  by  small,  targeted  shifts  rather  than  system- wide  remodeling. 

Since  most  proteins  change  so  little  in  abundance  over  the  die!  cycle,  the  few  that  do 
oscillate  strongly  stand  out.  In  particular,  ribonucleotide  reductase  (NrdJ)  changes  more 
than  10-fold  in  abundance  between  morning  and  late  afternoon.  This  variation  is 
certainly  coherent  with  NrdJ’s  role  in  DNA  synthesis,  which  occurs  between  noon  and 
sunset.  But  other  cell-cycle-specific  proteins  do  not  oscillate  nearly  so  strongly;  FtsZ  and 
MreB,  for  example,  vary  only  1.3-fold.  Why  the  large  cycle  in  NrdJ?  It  may  be  that  the 
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presence  of  ribonucleotide  reductase  in  the  cytosol  outside  of  C/S  phase  is  deleterious;  at 
other  times,  the  main  goal  of  nucleotide  synthesis  is  RNA,  and  having  material  diverted 
to  DNA  may  be  substantially  counterproductive.  But  the  presence  of  nrdJ  in  phage 
genomes  (L.R.  Thompson  et  ah,  in  prep)  offers  another  hypothesis:  the  low  abundance  of 
NrdJ  outside  of  the  DNA  synthesis  may  be  a  mode  of  defense  against  phage  infection. 
When  a  bacteriophage  infects  a  host  cell,  it  shuts  down  translation  of  the  host  genome 
and  begins  expressing  its  own  using  the  host’s  machinery.  Thus,  to  make  progeny,  the 
phage  is  dependent  on  the  presence  of  DNA-synthesis  enzymes  (notably  NrdJ)  in  the  host 
cell  at  the  time  of  infection.  If  infection  occurs  when  NrdJ  protein  levels  are  low  (as  they 
are  in  the  morning),  phage  infection  may  stall  for  want  of  deoxyribonucleotides  to  copy 
the  phage  genome.  In  this  scenario,  the  presence  of  ribonucleotide  reductase  genes  in  the 
majority  of  sequenced  cyanophage  genomes  (Sullivan  et  ah,  2005;  Sullivan  et  ah,  in 
press)  is  a  strategy  adopted  by  phage  to  circumvent  dependence  on  host  NrdJ  by  encoding 
their  own  copy  and  expressing  it  during  infection. 

The  results  of  this  experiment  also  have  implications  for  how  inferences  can  be  drawn 
between  lab-  and  field-based  datasets,  and  what  sorts  of  techniques  might  be  most  useful 
as  assays  of  conditions  and  microbial  activities  in  natural  environments.  First,  it  appears 
that  growth  in  dense,  axenic  culture  imposes  particular  oxidative  stress  on 
Prochlorococcus  cells.  Assessing  the  true  biochemistry  and  regulation  of  the  response  of 
Prochlorococcus  to  reactive  oxygen  species  under  oceanic  conditions  is  therefore  likely 
to  be  challenging,  since  the  cell  densities  currently  required  to  produce  sufficient  material 
for  some  kinds  of  molecular  analyses  (as  well  as  the  absence  of  exogenous  reductants 
produced  by  other  members  of  the  community)  are  a  source  of  oxidative  stress.  Second, 
if  measurements  of  gene  product  abundances  are  to  be  used  as  reporters  of  the 
physiological  activities  of,  and  environmental  stresses  felt  by,  organisms  in  natural 
settings,  then  a  multilevel,  systems  view  of  gene  expression  and  regulation  is  warranted. 
This  experiment  exemplifies  how  the  dynamics  of  protein  abundance  can  be  very 
different  in  both  timing  and  magnitude  from  those  of  transcripts,  yet  for  a  variety  of 
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methodological  reasons  (such  as  the  ability  to  selectively  amplify  a  gene  of  interest),  it  is 
often  desirable  to  assay  gene  expression  at  the  transcript  level.  Given  the  findings 
presented  here,  and  similar  results  emerging  from  systems  biology  more  generally  (Beyer 
et  ah,  2004;  Nie  et  ah,  2007),  we  suggest  that  complementing  transcript- level 
measurements  with  information  from  other  biological  metrics  (be  they  proteins, 
metabolites,  activity  assays,  etc.)  will  often  be  valuable  for  drawing  accurate 
biogeochemical  inferences. 

The  quantitative  concordance  between  levels  of  biological  organization  (transcriptome, 
proteome,  metabolism)  is  likely  to  be  greatest  in  concerted  responses  to  acute  stresses 
(Halbeisen  and  Gerber,  2009).  The  majority  of  laboratory  gene  expression  studies  test 
such  responses,  as  the  contrast  between  stress  and  control  conditions  is  distinct  and 
produces  strong  experimental  signals.  With  the  exception  of  studies  of  specific 
perturbation  events,  however,  observations  of  natural  populations  are  generally  not 
studies  of  acute  stress  responses.  When  we  look  at  a  microbial  community  in  its  natural 
habitat,  we  are  seeing  a  group  of  organisms  that  have  necessarily  adapted  to  that 
environment  over  many  generations.  When  we  contrast  two  environments  that  differ  in, 
for  example,  availability  of  a  particular  nutrient,  the  adaptation  involved  is  to  a  chronic 
stressor,  and  is  not  the  same  as  the  kind  of  acute  nutrient  limitation  that  is  often  imposed 
in  the  laboratory.  The  longest-running  experimental  evolution  study,  the  E.  coli  Long¬ 
term  Evolution  Experiment  (Barrick  et  ah,  2009),  has  tracked  populations  over  40,000 
generations,  and  most  steady-state  culture  studies  (e.g.,  in  chemostats)  are  much  shorter. 
Prochlorococcus  has  undergone  roughly  40,000  generations  just  since  the  year  1900, 
while  living  in  an  ocean  undergoing  both  natural  and  anthropogenic  change  on  a 
continuum  of  timescales  ranging  from  decadal  to  geologic.  Experimental  nutrient 
limitation  in  culture  can  help  identify  important  genes  and  functions  (e.g.,  Martiny  and 
Coleman  et  ah,  2006),  but  the  quantitative  dynamics  of  that  response  are  distinct  from 
those  involved  in  long-term  adaptation.  The  laboratory  study  presented  here  attempts  to 
approximate,  however  imperfectly,  the  quotidian  activities  of  Prochlorococcus  over  its 
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natural,  unstressed  diel  growth  cycle.  In  taking  a  multilevel  view  of  gene  expression  and 
metabolic  regulation,  we  hope  to  contribute  to  bridging  the  gap  between  laboratory  and 
environmental  characterizations  of  microbial  roles  in  biogeochemistry. 
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Supplementary  Figure  1 .  Timecourses  for  548  proteins  quantified  over  the  diel  cycle  in 
Prochlorococcus  MED4.  For  each  protein,  relative  abundance  is  shown  on  a  logj  scale.  Protein- 
level  abundance  at  each  timepoint  was  calculated  as  the  mean  of  a  lognormal  fit  to  the  filtered 
peak-level  abundance  ratios,  and  error  bars  indicate  95%  confidence  intervals  based  on 

an  unbiased  estimate  of  the  standard  error  of  that  mean  (Sec.  2.4.1).  Titles  indicate  the  gene 
locus  and  name,  where  applicable.  In  the  lower  right  corner  of  each  timecourse  panel  is  the 
g-value  calculated  for  the  significance  of  diel  cycling  (Sec.  2.4.2). 
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Supplementary  Figure  2.  Identifications  of  decoy  proteins  (i.e.,  reversed  MED4 
sequences)  in  the  full  diel  dataset  (blue  bars)  and  after  filtering  on  the  basis  of  peak  ratio 
CV  (see  Sec.  2.4.1),  as  a  function  of  the  number  of  timepoints  at  which  each  decoy 
sequence  was  found.  The  false  discovery  rate  (FDR),  calculated  from  the  ratio  of  decoy  IDs 
to  IV1ED4  protein  IDs,  is  also  shown,  as  a  function  of  the  minimum  number  of  timepoints 
required  for  an  ID.  In  the  unfiltered  dataset,  with  only  one  timepoint  required,  the  FDR  is 
3.2%;  if  two  are  required,  the  FDR  drops  to  1.6%.  Filtering  based  on  peak  ratio  CV  reduces 
the  maximum  number  of  timepoints  at  which  any  one  decoy  protein  is  found  from  9  to  5; 
futher  filtering  for  expression  timecourses  eliminated  decoy  data  entirely,  indicating  a 
protein-ID  FDR  in  the  diel  timeseries  dataset  of  <0.2%. 
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Abundance  Estimate,  Top3 


Supplementary  Figure  3.  Comparison  of  fractional  abundance  estimates  using  normal¬ 
ized  spectral  counting-based  (APEX;  Lu  et  al.  2007)  and  label-free,  MS'-based  (Top3;  Silva 
et  al.  2006)  techniques.  Blue  points  (n=1 799)  are  protein  timeponts  for  which  both  APEX 
and  Top3  abundance  estimates  were  obtained;  for  the  red  points  (n=701 3),  only  APEX 
values  were  calculated.  Note  that,  in  this  case,  detection  of  a  minimum  of  10  peptides  was 
required  for  calculation  of  aTop3  score. 
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Biogeochemical  insights  from  Prochlorococcus  systems  biology 


Jacob  R.  Waldbauer  and  Sallie  W.  Chisholm 


In  order  to  see  cells  as  biogeochemical  agents  and  incorporate  cellular  metabolism  into  a 
systems-level  view  of  natural  environments,  an  understanding  of  organisms’  molecular 
composition  is  fundamental.  Specifying  molecular  composition  -  albeit  largely  in  a 
laboratory  context  -  has  emerged  as  a  basic  goal  of  systems  biology  en  route  to  an 
integrated  view  of  cellular  function.  The  tools  developed  for  systems  biology  with  mainly 
biomedical  or  biotechnological  goals  in  mind  have  been  increasingly  turned  to  address 
questions  in  biogeochemistry  and  environmental  science.  Chapter  4  presented  an  example 
of  using  such  tools,  including  microarrays,  proteomics  and  RNA-sequencing,  to  assess  the 
inventories  and  dynamics  of  gene  products  in  Prochlorococcus.  In  this  chapter,  we  take  a 
holistic  view  of  the  molecular  composition  of  Prochlorococcus  cells,  first  modeling  how 
elemental  budgets  are  apportioned  among  major  pools  of  biochemicals,  and  then  addressing 
how  cellular  molecular  composition  informs  hypotheses  concerning  genome  evolution  and 
stochastic  gene  expression  effects.  We  propose  that  genome  streamlining  in 
Prochlorococcus  is  unlikely  to  be  driven  primarily  by  adaptation  to  oligotrophic  conditions, 
and  that  noise  in  gene  expression  may  play  an  ecological  role  in  limiting  the  growth  rates  of 
very  small  cells. 


1.  Elemental  and  biochemical  budgets  Prochlorococcus  cells 

We  consider  first  the  major  elemental  (carbon,  nitrogen,  phosphorus)  budgets  of  a 
‘hypothetical’  high-light  Prochlorococcus  cell.  Our  goal  is  to  provide  a  first-order 
accounting  of  biochemical  components,  on  the  basis  of  available  experimental  and 
genomic  data.  We  focus  on  the  B/Gl  period  of  the  cell  cycle,  when  the  cell  has  a  single 
copy  of  its  chromosome  (Wang  and  Levin,  2009);  for  Prochlorococcus,  this  period 
extends  from  approximately  midnight  to  mid-aftemoon  each  day.  Populations  will 
naturally  show  variation  around  the  values  presented,  and  that  variation  is  itself  likely  to 
be  interesting  and  ecologically  relevant.  The  calculations  and  assumptions  involved  in 
deriving  these  values  are  described  in  the  following  section. 
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1.1  Budgeting  calculations 


1.1.1  Elemental  stoichiometry 

Several  studies  have  presented  measurements  of  the  elemental  composition  of 
Prochlorococcus  cells  (Bertilsson  et  ah,  2003;  Heldal  et  ah,  2003;  Fu  et  ah,  2007)  and 
provide  a  basis  for  major-element  budgets.  These  papers  report  a  range  of  elemental 
contents  and  C:N:P  stoichiometries  for  Prochlorococcus  cells.  Here  we  treat  the  C:N 
ratio  as  an  independent  variable,  and  allow  it  to  range  over  the  span  reported  in  the 
literature  (5. 7-9. 9).  We  further  take  the  cellular  P  content  to  be  13  amol,  which,  as 
discussed  further  below,  represents  the  majority  of  results  reported  in  all  three  studies.  At 
a  given  C:N  ratio,  then,  the  C:P  ratio  is  chosen  so  that  the  cellular  carbon  budget  (detailed 
below)  closes  (i.e.,  such  that  the  cellular  C  content  equals  the  sum  of  the  constituent 
biochemicals).  The  resultant  modeled  elemental  contents  and  C:N:P  ratios  are  shown  in 
Figures  1  and  2,  plotted  along  with  the  literature  values.  The  elemental  budgets  for  three 
cases  (I,  II  and  III;  indicated  in  Figs.  1  and  2)  that  span  the  range  of  modeled 
stoichiometries  are  included  in  Table  1;  the  assumptions  and  calculations  behind  these 
budgets  are  detailed  in  Sections  1.1. 2- 1.1. 4,  and  the  results  are  discussed  in  Section  1.2. 

The  method  for  constructing  cellular  elemental  budgets  used  here  was  designed  to 
produce  self-consistent,  closed  budgets,  so  that  the  distribution  of  C,  N  and  P  among 
biochemical  pools  can  be  meaningfully  interpreted.  As  shown  in  Figures  1  and  2,  no 
single  set  of  elemental  contents  or  ratios  can  encompass  all  of  the  literature  data,  which  in 
any  case  represent  differing  strains,  media,  culture  conditions  and  growth  rates.  It  is  also 
not  generally  possible  to  close  budgets  based  on  a  fixed  stoichiometry  without  including 
‘mystery  pool’  of  biochemicals  of  arbitrary  composition  and  size,  chosen  solely  to 
account  for  material  stipulated  by  the  stoichiometry.  Including  such  a  chemically  ill- 
defined  component  of  the  cells  renders  the  proportional  demands  of  major  constituents 
such  as  protein  and  RNA  much  less  interpretable.  The  modeled  cellular  C  and  N  contents 
do  match  the  experimental  values  quite  well  (Figure  lA).  The  choice  was  made  to  fix  the 
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Figure  1 .  Cellular  contents  of  carbon,  nitrogen  and  phosphorus,  from  literature  reports 
and  as  modeled  here.  The  phosphorus-limited  measurement  by  Bertilsson  et  al.  (2003) 
(the  only  data  from  nutrient-limited  culture)  is  indicated  by  the  -P.  The  content  values  for 
the  three  model  cases  summarized  in  Table  1  (I,  II,  III)  are  indicated  by  circled  symbols. 
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Figure  2.  Cellular  C:N:P  stoichiometries,  from  literature  reports  and  as  modeled  here. 
The  phosphorus-limited  measurement  by  Bertilsson  et  al.  (2003)  (the  only  data  from 
nutrient-limited  culture)  is  indicated  by  the  -P.  The  ratio  values  for  the  three  model  cases 
summarized  in  Table  1  (I,  II,  III)  are  indicated  by  circled  symbols. 
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Phosphorus 

Nitrogen 

Carbon 

Model  Case 

1 

II 

III 

1 

II 

III 

Ratio  to  Phosphorus 

1 

53 

33 

17 

302 

225 

167 

Cell  Content  (fmol) 

0.013 

0.69 

0.42 

0.22 

3.93 

2.93 

2.17 

Chromosome 

42.4% 

2.9% 

4.8% 

9.0% 

1 .4% 

1 .9% 

2.5% 

RNA 

rRNA 

40.0% 

3.0% 

4.8% 

9.1% 

1 .3% 

1 .7% 

2.3% 

tRNA 

7.5% 

0.6% 

0.9% 

1 .7% 

0.2% 

0.3% 

0.4% 

mRNA 

2.5% 

0.2% 

0.3% 

0.6% 

0.1% 

0.1% 

0.1% 

Protein 

91 .2% 

85.6% 

72.8% 

60.5% 

46.9% 

28.4% 

Membrane  Lipids 

5.0% 

LPS 

0.5% 

0.9% 

1 .7% 

5.2% 

7.0% 

9.4% 

Cell  Envelope 

8.3% 

11.1% 

1 5.0% 

Thylakolds 

15.4% 

20.6% 

27.8% 

Peptidoglycan 

1 .0% 

1.6% 

2.9% 

0.9% 

1 .3% 

1 .7% 

Pigments 

Chlorophyll 

0.6% 

1.0% 

2.0% 

1.5% 

2.1% 

2.8% 

Zeaxanthin 

2.2% 

3.0% 

4.0% 

Metabolites 

2.5% 

0.10% 

0.16% 

0.31% 

3.0% 

4.0% 

5.4% 

Table  1 .  The  distributions  of  carbon,  nitrogen  and  phosphorus  contents  among  major 
biochemical  constituents,  for  three  model  cases  (I,  II  and  III)  with  cellular  stoichiometries 
as  indicated.  Note  that  the  cellular  phosphorus  content  is  the  same  (1 3  amol)  for  all 
cases. 
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cellular  P  content  at  13  amol  because  values  in  that  range  appear  in  all  three  studies  and  it 
appears  to  approximate  a  lower  limit  of  P  content  in  Prochlorococcus  (Figure  IB  and  C). 
We  are  less  interested  here  in  precisely  modeling  exponential  growth  under  optimum, 
nutrient-replete  conditions  than  in  assessing  how  a  cell  can  be  built  with  the  small 
nutrient  complements  likely  to  prevail  in  oligotrophic  habitats.  In  our  judgment,  the  sole 
measurement  of  Prochlorococcus  cellular  elemental  stoichiometry  under  nutrient-limited 
conditions  (that  of  Bertilsson  et  al.,  2003,  indicated  in  Figures  1  and  2),  is  quite  relevant 
to  real  oceanic  conditions  and  should  be  given  due  consideration.  The  C:N:P 
stoichiometries  that  result  from  this  model  approach  those  found  by  Heldal  et  al.  (2003) 
at  higher  C:N  ratios,  and  C:P  and  N:P  ratios  rise  as  C:N  drops  (Figure  2). 

These  broad  uncertainties  in  cellular  elemental  composition  mean  that  the  biochemical 
budgets  outlined  here  should  be  considered  flexible.  Certainly,  a  more  precisely 
constrained  composition  would  be  required  for  detailed  systems  modeling,  such  as  flux 
balance  analysis.  Nevertheless,  the  kinds  of  budgeting  calculations  described  here  can 
readily  accommodate  a  range  of  stoichiometries,  and  hopefully  future  data  (especially 
from  field  samples)  will  better  constrain  the  C:N:P  ranges  most  relevant  to  oceanic 
conditions. 

1.1.2  Phosphorus  budget 

Phosphorus  is  present  in  Prochlorococcus  as  a  component  of  several  types  of 
biochemicals:  nucleic  acid  polymers  (DNA  and  RNA),  phospholipids,  and  small 
molecule  metabolites.  The  amount  in  the  chromosome  can  be  calculated  exactly  from  the 
genome  size  (1.66Mbp).  Phospholipids  are  rather  rare  in  Prochlorococcus  compared  to 
most  other  microbes  likely  as  an  adaptation  to  chronic  P  scarcity  (Van  Mooy  et  al.,  2006). 
We  calculate  the  total  number  of  lipid  molecules  in  the  cell  by  assuming  a  typical 
geometry  for  Prochlorococcus:  the  cell  envelope  (outer  membrane,  murein  layer  and 
inner  membrane)  is  set  at  0.6  pm  in  diameter.  Within  the  cell  are  two  complete,  spherical 
thylakoid  membranes  averaging  0.5pm  in  diameter  (compare  Fig.  lA-D  of  Ting  et  al.. 
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2007)  -  for  geometric  simplicity,  we  neglect  the  small  ‘hairpin’  region  where  each 
membrane  wraps  around  itself  to  enclose  the  lumen.  Thus  the  inner  and  outer  cell 
membranes  comprise  4  leaflets  (2  bilayers),  while  the  thylakoids  comprise  8  leaflets  (4 
bilayers),  and  we  assume  a  membrane  area  of  0.55nm  per  lipid  molecule  (Lopez 
Cascales  et  ah,  1996)  and  that  lipids  constitute  80%  of  total  membrane  area  (Nikaido, 
1996).  Hence  our  hypothetical  cell  contains  about  14  million  lipid  molecules,  2%  of 
which  are  assumed  to  be  phosphatidylglycerol  (Van  Mooy  et  ah,  2006).  Little  data  exists 
on  metabolites  or  their  concentrations  in  Prochlorococcus;  here  we  assume  a  cytosolic 
concentration  of  ImM  for  nucleotide  triphosphates  (e.g.,  ATP,  GTP),  probably  the  most 
abundant  P-containing  metabolites  (Bennett  et  ah,  2009).  Finally,  all  of  the  13  amol  P 
complement  not  accounted  for  by  DNA,  lipids  and  metabolites  is  assumed  to  be  RNA, 
thereby  closing  the  cellular  phosphorus  budget.  Total  RNA  is  assumed  to  be  80%  rRNA, 
15%  tRNA  and  5%  mRNA  (Bremer  and  Dennis,  2008). 

1.1.3  Nitrogen  budget 

Nitrogen  in  DNA  is  again  calculated  exactly  from  the  genome  size  and  base  composition 
(30.8%  G+C).  Nitrogen  in  RNA  is  calculated  from  the  RNA  P  content,  with  an  RNA 
N:P  ratio  of  3.8:1  derived  from  MED4  RNA  sequences.  To  calculate  the  N  in 
metabolites,  we  assume  that  the  metabolite  pool  is  dominated  by  compatible  solutes  at  a 
concentration  of  0.1  M  (Bennett  et  ah,  2009).  The  compatible  solute  pool  is  taken  to  be 
comprised  of  sucrose  /  glucosylglycerate  /  glutamate  in  a  10:5:1  stoichiometric  ratio 
(Klahn  et  ah,  2009);  hence  metabolite  N  reflects  cytosolic  glutamate.  Nitrogen  in 
chlorophyll  is  based  on  a  chlorophyll  content  of  1  fg/cell,  an  average  of  literature  values 
that  range  from  0.3-5  fg/cell  (Morel  et  ah,  1993;  Moore  and  Chisholm,  1999;  Claustre  et 
ah,  2002).  The  amount  in  peptidoglycan  is  again  calculated  from  the  cell  geometry, 
assuming  a  cell-wall  area  of  2  nm  per  disaccharide  unit  (Vollmer  and  Holtje,  2004),  and 
an  N  content  of  7  atoms/unit.  Nitrogen  in  lipopolysaccharide  (LPS)  is  also  calculated 
from  membrane  area,  assuming  a  membrane  area  of  0.202  nm  per  fatty  acid  chain  in 
LPS.  LPS  structure  was  assumed  to  be  the  same  as  that  reported  by  Snyder  et  al.  (2009) 
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for  strains  of  marine  Synechococcus,  with  two  nitrogen  atoms  and  four  fatty  acid  chains 
per  unit,  resulting  in  roughly  1.1  million  LPS  units  per  cell.  Finally,  all  remaining 
nitrogen  is  assumed  to  be  in  protein,  thereby  closing  the  N  budget.  The  number  of  amino 
acids  in  the  cell  can  be  calculated  from  the  average  nitrogen  content  of  1.310  N  per 
residue,  derived  from  the  composition  of  the  MED4  proteome  as  measured  over  the  diel 
cycle  (see  Sec.  2). 

1.1.4  Carbon  budget 

Carbon  in  DNA  is  calculated  from  genome  size  and  base  composition.  Carbon  in  RNA  is 
calculated  from  P  content,  with  an  RNA  C:P  ratio  of  9.5:1.  Metabolite  C  is  calculated 
based  on  the  same  assumptions  about  compatible  osmolytes  made  in  the  nitrogen  budget. 
Chlorophyll  C  is  again  based  on  1  fg  Chl/cell,  and  zeaxanthin  C  is  based  on  a  Zea:Chl 
ratio  of  1.25  (wt/wt;  Claustre  et  ah,  2002).  Protein  C  is  calculated  from  the  cellular 
amino  acid  content,  with  an  average  residue  composition  of  5.075  C  per  amino  acid, 
based  on  protein  sequences.  Lipid  carbon  is  based  on  the  measurements  of  Van  Mooy  et 
al.  (2006),  which  suggest  an  average  C  content  of  39.7  atoms  per  lipid  molecule.  Based 
on  the  cell  geometry  calculations  described  above,  there  are  approximately  20  million 
lipid  molecules  in  our  hypothetical  Prochlorococcus  cell.  This  is,  however,  probably  the 
most  uncertain  component  of  the  cellular  carbon  budget,  as  membrane  geometry  and  lipid 
composition  are  likely  to  vary  strongly  with  growth  rate,  light  intensity  and  temperature. 

1.2  Budgeting  results 

Three  scenarios  of  C:N:P  stoichiometry  span  the  modeled  range:  cases  I  (302:53:1),  II 
(225:33:1),  and  III  (167:17:1).  Case  III  is  likely  most  analogous  to  exponential  growth  in 
culture  under  nutrient-replete  conditions.  Case  I  reflects  extreme  phosphorus  limitation, 
akin  to  the  results  of  Bertilsson  et  al.  (2003),  and  possibly  applicable  to  exceptionally  P- 
poor  oceanic  regions  such  as  the  Sargasso  Sea  (Ammerman  et  al.,  2003)  or  eastern 
Mediterranean  (Thingstad  et  al.,  2005),  though  whether  the  high  cell  contents  of  C  and  N 
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suggested  by  the  model  would  actually  obtain  in  such  situations  is  uncertain.  Case  II  is 
probably  the  most  relevant  for  the  majority  of  conditions  encountered  by 
Prochlorococcus  in  oligotrophic  marine  habitats.  The  range  presented  here  should  not  be 
construed  to  necessarily  encompass  the  whole  range  of  possible  compositions  of 
Prochlorococcus  cells  in  the  world’s  oceans.  In  particular,  should  a  cell  operate  with  a 
phosphorus  quota  substantially  larger  or  smaller  than  13  amol,  a  number  of  budgetary 
aspects  could  be  qualitatively  different.  However,  as  discussed  in  the  next  section,  the  P 
quota  can  hardly  be  much  smaller,  and  given  that  most  of  the  elemental  stoichiometry 
data  is  from  P-replete  cultures,  it  seems  unlikely  to  be  very  much  larger.  Just  how  ‘hard¬ 
wired’  the  phosphorus  quota  is  into  a  given  strain  of  Prochlorococcus  remains  an 
intriguing  and  important  question. 

1.2.1  Phosphorus  economy 

The  P  budget  of  our  hypothetical  Prochlorococcus  cell  illustrates  the  “phosphorus 
economy”  (Coleman  and  Chisholm,  2007)  of  small  cells  adapted  to  nutrient-poor 
environments.  Note  that  the  phosphorus  budget  is  the  same  for  cases  I,  II  and  III,  since 
the  cellular  P  content  is  fixed  in  this  model.  More  than  40%  of  the  cellular  P  quota  is 
devoted  to  the  ‘fixed  cost’  of  the  chromosome.  This  makes  the  fact  that  most 
Prochlorococcus  cells  contain  a  substantial  number  of  rare  flexible  genes  (i.e.,  genes 
present  in  some  but  not  all  Prochlorococcus  strains;  Kettler  et  ah,  2007)  in  their  genomes 
-  likely  recently  acquired  and  only  occasionally  with  adaptive  value  -  all  the  more  telling 
of  the  importance  of  continuous  gene  exchange  to  microbial  ecology  and  evolution 
(Coleman  and  Chisholm,  submitted).  We  discuss  the  relationship  between  genome  size, 
composition  and  nutrient  budgets  in  detail  in  Section  2,  below. 

The  amount  of  P  in  rRNA  (5.2  amol)  is  sufficient  to  build  598  ribosomes.  This  is  quite  a 
small  number;  by  comparison,  log-phase  E.  coll  cells  have  between  8,000  and  73,000 
ribosomes,  depending  on  growth  rate  (Bremer  and  Dennis,  2008).  Similarly,  the  number 
of  Prochlorococcus  tRNA  molecules  predicted  from  the  budget  (7,866),  is  10-  to  100- 
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fold  lower  than  E.  coli  (74,000-680,000;  Bremer  and  Dennis,  2008).  The  predicted 
amount  of  mRNA  is  also  quite  small:  just  200kb  (compared  to  1.0-4.3Mb  in  E.  coli', 
Bremer  and  Dennis,  2008),  which  allows  only  about  12%  of  the  genome  to  be  transcribed 
at  any  given  time.  Together,  the  DNA  to  encode  genes  and  the  RNA  to  express  them 
account  for  over  90%  of  the  cellular  phosphorus  budget.  To  a  first  approximation, 
Prochlorococcus  uses  any  acquired  P  for  nucleic  acids  -  about  half  for  DNA,  half  for 
RNA. 

The  other  two  phosphorus-containing  classes  of  biochemicals,  phospholipids  and 
metabolites,  are  relatively  minor  in  the  cellular  P  budget.  The  low  phospholipid  content 
of  picocyanobacteria,  especially  Prochlorococcus  (~2  mol%),  has  recently  been  explored 
in  detail  by  Van  Mooy  et  al.  (2006;  2009).  Interestingly,  Prochlorococcus  does  not  seem 
to  upregulate  phospholipid  biosynthesis  even  when  grown  in  P-replete  conditions  (Van 
Mooy  et  al.,  2006).  This  is  in  stark  contrast  to  other  bacteria:  marine  heterotrophs  contain 
>80%  phospholipids  (Van  Mooy  et  al.,  2006),  while  E.  coli  uses  phospholipids  almost 
exclusively  (Cronan  and  Rock,  2008).  The  amount  of  phosphorus  in  metabolites,  on  the 
other  hand,  is  more  uncertain.  Little  metabolome-wide  quantitative  data  exists  for 
bacteria,  and  none  in  Prochlorococcus',  the  estimates  here  are  based  on  the  recent 
comprehensive  study  by  Bennett  et  al.  (2009)  in  E.  coli.  It  has  recently  been  shown  that 
sucrose  is  the  major  compatible  solute  in  Prochlorococcus  cells,  with  glucosylglycerate 
taking  the  place  of  glutamate  as  the  second-most  abundant  metabolite  under  N-limiting 
conditions  (Klahn  et  al.,  2009).  While  glucosylglycerate  contains  neither  P  nor  N,  it  is 
synthesized  from  relatively  P-rich  precursors  (ADP-glucose  and  3 -phosphogly cerate), 
suggesting  that  the  glycosylglycerate  /  glutamate  exchange  may  also  be  a  P/N 
substitution. 

i.2.2  Nitrogen 

The  great  majority  of  cellular  nitrogen  -  80  to  90%  -  is  in  protein  (Figure  3A),  in  accord 
with  findings  for  other  phytoplankton  and  bacteria  generally  (Falkowski  and  Raven, 
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Figure  3.  The  distributions  of  cellular  nitrogen  (A)  and  carbon  (B)  among  pools  of 
biochemicals,  as  a  function  of  modeled  C:N  ratio.  TheGN  ratios  of  the  three  model  cases 
summarized  in  Table  1  (1=5.7;  11=6.9;  111=9.7)  are  indicated. 
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2007).  Even  at  the  highest  C:N  ratios  considered  (case  III),  the  next  largest  pool  of  N, 
RNA,  is  less  than  10%  of  the  total.  It  is  worth  noting,  however,  that  the  amount  of  N  in 
peptidoglycan  (up  to  3%)  was  calculated  for  a  monolayer.  If  there  are  conditions  under 
which  Prochlorococcus  produces  a  much  thicker  cell  wall,  peptidoglycan  could  become  a 
more  significant  part  of  its  nitrogen  requirements.  Metabolite  N  is  quite  small  (0.1- 
0.3%),  since  Prochlorococcus  uses  carbohydrates  as  its  primary  compatible  solutes.  As 
noted  above,  Prochlorococcus  seems  to  engage  in  a  glucosylglycerol-for-glutamate 
substitution  under  N  limitation,  which  could  have  a  marginal  impact  on  the  overall 
nitrogen  budget. 

Over  the  C:N  range  considered,  the  number  of  amino  acids  in  the  cell  varies  almost 
fourfold,  from  0.75x10^  to  2.9x10^,  comprising  between  290,000  and  1.12  million 

o 

protein  molecules.  For  comparison,  exponentially-growing  E.  co// contains  7.6x10  to 

o 

23.7X  10  amino  acids  per  cell  (Bremer  and  Dennis,  2008).  Based  on  the  results  in 
Chapter  4,  more  than  half  of  the  amino  acids  are  concentrated  in  the  50  most  abundant 
proteins.  Having  an  estimate  of  the  amount  of  cellular  protein  also  allows  further 
interpretation  of  the  fractional  abundance  values  (APEX  scores),  though,  as  noted  in 
Chapter  4,  these  will  generally  be  overestimates  of  the  true  proportional  abundance  of  a 
given  protein.  In  particular,  we  can  see  what  the  range  of  fractional  abundances  implies 
in  copy-number  terms.  The  most  abundant  proteins  have  fractional  abundances  up  to 
-0.05,  implying  a  range  of  15,000-50,000  copies  per  cell.  The  lowest  observed  APEX 
scores  from  the  diel  experiment  are  -10'^,  which  would  equate  to  between  3  and  1 1 
copies  per  cell.  As  the  low  end  of  the  calculated  fractional  abundance  range  is  of  order 
single-copy,  it  appears  that  the  dynamic  range  of  protein  abundances  in  Prochlorococcus 
MED4  spans  roughly  four  orders  of  magnitude. 

1.2.4  Carbon 

Carbon  allocation  to  biochemicals  is  shown  in  Figure  3B  for  our  range  of  modeled  C:N 
ratios.  The  two  major  components  of  the  cellular  carbon  budget  are  protein  and  lipids. 
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and  the  largest  effect  of  shifting  elemental  stoichiometry  is  on  the  relative  amounts  of  C 
in  the  two  pools.  In  case  I,  there  is  twice  as  much  carbon  in  protein  as  in  the  lipids,  while 
in  case  III,  the  proportions  are  reversed  (nearly  twice  as  much  in  lipids  as  in  protein). 
Most  cells  probably  divide  net  photosynthate  approximately  equally  between  the  two, 
which  is  typical  for  phytoplankton  (Falkowski  and  Raven,  2007).  This  budget  does  not 
include  a  specific  amount  of  stored  carbon,  likely  to  be  in  the  form  of  glycogen  or  other 
polysaccharides,  which  is  produced  during  the  day  and  catabolized  at  night  (Chapter  4 
and  Zinser  et  ah,  2009)  and  could  constitute  a  substantial  proportion  of  the  cellular  C 
budget  early  in  the  dark  period.  Other  components  of  the  carbon  budget  are  small, 
amounting  to  no  more  than  5%  individually  and  15%  collectively,  but  should  not  be 
disregarded.  Metabolites,  in  particular,  might  comprise  only  3-5%  of  cellular  carbon,  but 
this  is  probably  the  pool  of  cellular  biochemicals  that  is  turning  over  at  the  highest  rate, 
as  well  as  that  most  likely  to  exchange  with  the  extracellular  medium  and  thereby  impact 
water-column  chemistry. 

These  elemental  and  biochemical  budgets  provide  a  useful  baseline  for  considering  how 
gene  expression  and  nutrient  utilization  can  influence  one  another.  In  the  following 
sections,  we  explore  two  aspects  of  this  interplay: 

•  To  what  extent  adaptation  to  oligotrophic  conditions  might  drive  genome 
streamlining  in  Prochlorococcus 

•  How  these  small  cells  with  limited  pools  of  biochemicals  cope  with  stochastic 
effects  in  gene  expression 

2.  Impact  of  genome  streamlining  on  cellular  nutrient  budgets 

2.1  Genome  streamlining:  occurrence  and  hypotheses 

The  cellular  biochemical  budgets  in  Section  1  enable  us  to  assess  the  effect  of  genome 
streamlining  in  Prochlorococcus  on  its  nutrient  requirements.  It  has  been  suggested  that 
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the  loss  of  genes  and  decrease  in  genomic  G+C  content  evident  along  the  evolutionary 
grade  from  Synechococcus  through  low-light  Prochlorococcus  to  high-light 
Prochlorococcus  (Figure  4)  reflects  specialization  for  growth  in  oligotrophic  habitats 
(Dufresne  et  al.,  2005;  Partensky  and  Garczarek,  2010).  Other  small  marine  bacteria, 
notably  the  abundant  11  a-proteobacteria,  also  have  small,  A-i-T-rich 

genomes,  and  a  similar  nutrient-economy  rationale  has  been  proposed  to  explain  this 
(Giovannoni  et  ah,  2005),  though  the  evolutionary  context  is  not  well  characterized  in 
these  cases.  One  hypothesis  (Wilhelm  et  al.,  2007)  holds  that  nutrient  limitation  places 
these  bacteria  are  under  strong  selective  pressure  to  excise  any  ‘extraneous’  DNA  whose 
expression  is  not  essential  or  immediately  advantageous. 

Small,  A-i-T-rich  genomes  are  not,  however,  limited  to  bacteria  from  nutrient-poor 
environments  (nor  are  all  oligotroph  genomes  compact  and  G-i-C-poor).  Instead,  the 
majority  of  such  genomes  are  found  in  parasites  (Dagan  et  al.,  2006),  for  whom  the 
explanation  for  genome  streamlining  is  quite  different.  For  parasites,  the  stable,  nutrient- 
rich  environment  of  the  host  renders  many  cellular  functions  (especially  biosynthetic 
ones)  superfluous,  thereby  relaxing  selective  pressure  to  maintain  those  functions  and 
allowing  drift  to  eventually  eliminate  genes  from  the  genome.  So  in  one  case,  genome 
streamlining  is  proposed  to  be  the  result  of  strong  selective  pressures  from  the 
environment,  while  in  the  other,  streamlining  results  from  weakening  of  environmental 
selective  pressure.  These  scenarios  are  not  necessarily  contradictory  and  both  could  be 
true;  it  would,  however,  be  an  unusual  example  of  convergent  evolution  for  these 
opposing  causes  to  produce  the  same  effect. 

Using  the  cellular  elemental  and  biochemical  budgets  developed  above,  we  can  quantify 
the  relative  impact  of  genome  size  reduction  and  base  composition  changes  on  nutrient 
requirements.  We  consider  the  changes  in  genome  composition  between 
Prochlorococcus  MIT9313  and  MED4,  namely  a  reduction  in  genome  size  by  750kbp 
and  a  drop  in  G-i-C  content  from  50.7%  to  30.8%  (Fig.  4).  The  effects  on  nutrient  budgets 
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Figure  4.  Genome  characteristics  of  marine  picocyanobacteria.  At  left,  cladogram  of  sequenced 
Prochlorococcus  and  marine  Synechococcus  strains  (from  Kettler  et  al.  2007).  At  right,  plot  of 
proportional  amino  acid  usage  in  protein-coding  sequences.  Changes  are  largely  tied  to  the  trend 
of  decreasing  genomic  G+C  content,  exemplified  by  the  5-fold  drop  in  the  Arginine:Lysine  ratio 
(right-hand  column  and  bold  lines  in  plot).  Dotted  arrows  indicate  branches  on  the  tree  where  the 
largest  shifts  in  amino  acid  usage  appear  to  have  occurred. 


183 


are  calculated  relative  to  the  cellular  compositions  in  Section  1,  which  is  for  a  high- light 
adapted,  MED4-like  cell.  The  intent  here  is  not  to  compare  nutrient  requirements  of 
MIT9313-  and  MED4-type  cells,  but  rather  to  gauge  the  additional  burden  that  carrying  a 
MIT93 13-like  genome  would  have  imposed  on  a  MED4-like  cell. 

2.2  Consequences  for  nutrient  requirements 

The  larger  genome,  of  course,  requires  more  phosphorus  to  duplicate.  An  additional 
750kbp  would  increase  the  size  of  the  MED4  genome  by  45%.  The  net  increase  in  the 
cellular  P  requirement  for  just  the  extra  DNA  would  be  19.1%.  Assuming  these 
additional  genes  were  expressed  at  the  same  average  level  as  the  rest  of  the  genome,  the 
additional  mRNA  would  add  1.1%  to  the  phosphorus  quota.  Hence  the  total  P  savings 
from  the  smaller,  high-light  genome  amounts  to  roughly  20.2%  of  the  cellular  phosphorus 
budget  -  in  total,  an  amount  likely  visible  to  selection  under  P-limiting  conditions. 
However,  DNA  is  lost  from  bacterial  genomes  in  fragments  much  smaller  than  750kbp  - 
perhaps  only  a  few  tens  of  bases  at  a  time  -  and  the  incremental  effect  of  each  loss  is 
much  smaller  and  less  likely  to  be  selected  for.  Bragg  and  Wagner  (2009)  have  shown 
how  single-atom  changes  in  gene  composition  can  be  adaptive  under  nutrient  limitation, 
if  the  selective  pressure  is  strong  enough.  Quantifying  this  selective  pressure  would 
require  knowledge  of  the  effective  population  size  of  Prochlorococcus  (i.e.,  the  size  of  a 
subpopulation  bearing  the  same  degree  of  neutral  genetic  varation  as  the  total 
population),  which  is  difficult  to  ascertain  for  bacteria  generally  (Eraser  et  ah,  2007).  The 
global  census  population  is  of  order  10^^  cells  (Partensky  et  ah,  1999),  though  the 
effective  population  size  could  be  many  orders  of  magnitude  smaller.  The  potential 
exists  for  a  huge  effective  population  size  to  select  for  even  small  chromosomal  changes 
in  Prochlorococcus.  As  discussed  below,  however,  there  are  multiple  lines  of  evidence 
from  high- light  isolate  genomes  that  suggest  that  ‘extraneous’  DNA  is  not  always  rapidly 
excised  from  cells. 
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The  effects  of  genome  streamlining  on  the  cellular  nitrogen  budget  are  more  equivocal. 
750,000  base  pairs  of  DNA  amounts  to  1.2-3. 7%  of  the  cellular  N  budget.  The  effect  of 
the  decrease  in  G-i-C  content  is  even  smaller  -  if  the  MED4  genome  were  50.7%  G-i-C, 
the  cellular  N  requirement  would  increase  by  only  0. 1-0.2%.  Because  of  a  bias  in  the 
genetic  code  that  associates  G-i-C-rich  codons  with  N-rich  amino  acids,  a  decrease  in 
genomic  G-i-C  will  also  produce  a  lower-N  proteome.  This  can  be  clearly  seen  in  the 
trend  in  the  Arg:Lys  ratio  along  the  evolutionary  grade,  which  decreases  fivefold  from 
~2.3  in  Synechococcus  to  0.45  in  high-light  Prochlorococcus  (Fig.  4).  Arginine  and 
lysine  are  both  basic  amino  acids,  and  can  often  substitute  for  one  another  in  proteins. 
Since  arginine  has  4  nitrogens  to  lysine’s  two,  and  Arg  is  encoded  by  more  G-tC-rich 
codons  (CGx,  AGA,  AGG)  than  Lys  (AAA,  AAG),  a  single  transition  mutation  could 
save  the  cell  3  nitrogen  atoms.  If  the  MED4  genome  coded  for  amino  acids  at  the  same 
relative  frequency  as  MIT9313,  its  proteome  would  be  predicted  to  be  richer  in  nitrogen 
by  0.031  N  per  amino  acid.  For  the  cell  compositions  outlined  in  Section  1,  this  would 
increase  the  N  budget  by  1.8-2. 1%.  Hence  the  total  potential  nitrogen  savings  associated 
with  genome  streamlining  in  MED4  amount  to  3. 4-5. 7%  of  the  cellular  N  requirement. 

As  in  the  case  of  phosphorus,  each  small,  incremental  change  in  genome  composition 
would  have  to  be  visible  to  selection  -  which  could  be  plausible,  given  the  potentially 
huge  effective  population  size  of  Prochlorococcus,  and  would  be  influenced  by  the 
expression  level  of  each  protein.  Nitrogen  savings  in  highly-expressed  proteins  will  have 
a  proportionally  larger  impact  on  the  cellular  N  budget,  and  therefore  be  more  likely  to  be 
selected  under  N  limitation.  The  calculation  of  fractional  protein  abundances  based  on 
the  data  presented  in  Chapter  4  equips  us  to  address  this  question.  Figure  5  shows  three 
views  of  the  amino  acid  composition  of  the  proteome:  the  average  of  coding  sequences  in 
the  genome  (also  plotted  in  Figure  4),  the  proteome  composition  suggested  by  the 
fractional  abundance  measurements  (i.e.,  the  average  of  sequences  of  detected  proteins, 
weighted  by  APEX  scores),  and  the  average  of  sequences  of  the  51  most  abundant 
proteins  in  MED4  (i.e.,  those  with  highest  average  APEX  scores  over  the  diel  cycle;  see 
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Figure  5.  Amino  acid  usage  in  the  MED4  proteome,  contrasting  the  proteome-as-coded 
(orange)  with  the  proteome-as-expressed  (light  &  dark  green).  Orange  bars  are  the  aa 
usage  in  protein-coding  regions  of  the  genome  (as  also  plotted  in  Fig.  4).  Dark  green  bars 
are  the  composition  of  the  proteome  as  detected  over  the  diel  cell-division  cycle  (see 
Chapter  4;  based  on  weighting  by  APEX  scores).  Light  green  bars  are  the  average  amino 
acid  composition  of  the  51  most  abundant  IV1ED4  proteins  (Chap.  4,  Table  2).  The  number 
of  nitrogen  atoms  per  amino  acid  each  set  of  sequences  is  also  indicated. 
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Chapter  4).  The  proteome-as-expressed  is  indeed  lower  in  nitrogen  than  the  proteome-as- 
coded,  to  the  tune  of  0.03  N  per  amino  acid  (1.34  vs.  1.31),  or  2.4%.  The  most  abundant 
proteins  are  even  slightly  lower  in  N  content.  This  suggests  that  the  sequences  of  highly- 
expressed  proteins  have  in  fact  been  selected  to  limit  N  demand. 

Does  the  genome  streamlining  process  explain  the  relative  nitrogen  efficiency  of  highly- 
expressed  proteins?  Surprisingly,  the  answer  seems  to  be  no.  The  abundant  proteins  in 
MED4  are  not  endmembers  or  extreme  examples  of  the  evolutionary  compositional 
trends  illustrated  in  Figure  4.  Rather,  the  low  N  content  of  abundant  proteins  stems 
primarily  from  less  usage  of  two  N-rich  amino  acids  (Lys  and  Asp),  whose  genomic 
frequency  increases  towards  the  high-light  clade  (Fig.  4),  in  favor  of  more  usage  of  three 
N-poor  residues  (Ala,  Gly  and  Val)  whose  frequency  decreases  along  the  trend.  Since  A, 
G  and  V  are  encoded  by  G-i-C-rich  codons  (GCx,  GGx  and  GTx,  respectively),  the  G-i-C 
content  of  the  50  most  highly-expressed  genes  is  actually  higher  than  the  genome  as  a 
whole  (38.9%  vs.  30.8%).  In  other  words,  highly-expressed  proteins  show  economy  of  N 
usage  despite  the  overall  effect  of  genome  streamlining,  not  because  of  it.  Whatever  the 
evolutionary  process  driving  genome  streamlining,  it  does  not  appear  to  be  closely  linked 
to  reducing  cellular  demand  for  nitrogen. 

2.3  Streamlining  in  the  light  of  evolution 

Viewed  in  a  broader  evolutionary  context,  the  case  for  genome  streamlining  in 
Prochlorococcus  as  being  driven  primarily  by  adaptation  to  nutrient  scarcity  is  tenuous. 
The  two  kinds  of  genomic  changes  -  size  reduction  and  G-i-C  loss  -  are  uncoupled  in  the 
evolutionary  history  of  Prochlorococcus.  From  the  phylogenetic  tree  in  Figure  4  it  would 
at  first  appear  that,  rather  than  being  a  gradual,  continuous  process,  there  was  a  single 
episode  of  genome  size  reduction  in  Prochlorococcus,  along  the  branch  between  the 
MIT9313/MIT9303  divergence  and  all  other  strains.  With  more  careful  consideration  of 
the  histories  of  individual  gene  families,  a  different  picture  emerges:  the  Synechococcus- 
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like  size  of  the  MIT9313/MIT9303  genomes  is  not  due  to  retention  of  nearly  all  genes 
from  a  ProchlorococcustSynechococcus  ancestor,  but  rather  substantial  gene  gain  within 
that  lineage  (Kettler  et  al.,  2007)  -  implying  that  genome  size  reduction  actually  occurred 
in  the  Prochlorococcus  common  ancestor.  The  slightly  enlarged  genomes  of  the  NATL 
strains  -  bigger  than  either  the  more  basal  or  distal  groups  in  the  tree  -  also  reflects 
lineage- specific  gene  acquisition.  Moreover,  none  of  these  patterns  correlates  at  all  with 
the  distributions  of  phosphorus-stress  related  genes  among  Prochlorococcus  strains.  As 
shown  by  Martiny  and  Coleman  et  al.  (2006),  closely-related  isolates  with  similar 
genome  sizes  often  bear  radically  different  complements  of  P  acquisition  and  utilization 
genes.  As  noted  by  Coleman  and  Chisholm  (2007),  adaptation  to  phosphorus  availability 
appears  to  occur  on  a  very  different  evolutionary  scale  from  changes  in  genome  size. 

Drops  in  G-i-C  content,  and  the  associated  changes  in  amino  acid  usage,  also  have  a 
phylogenetic  distribution  distinct  from  the  pattern  of  genome  size  reduction.  These 
appear  to  be  concentrated  in  two  places  on  the  evolutionary  tree:  between 
MIT9313/MIT9303  and  other  Prochlorococcus,  and  between  the  NATL  strains  and  the 
ancestor  of  the  high- light  adapted  group  (Figure  4).  If  G-i-C  content  changes  were 
primarily  influenced  by  nitrogen  availability  while  genome  size  reduction  is  a 
phosphorus-saving  measure,  it  is  unclear  how  the  two  should  become  so  decoupled  over 
evolutionary  timescales. 

Most  significantly,  there  is  increasing  evidence  that,  even  under  the  strongest  forms  of  P- 
limitation,  cells  bear  a  significant  number  of  genes  that  have  little  or  no  immediate 
adaptive  value.  In  high-light  Prochlorococcus  isolates,  roughly  10%  of  the  genome 
consists  of  hypervariable  regions,  and  the  size  of  these  regions  is  uncorrelated  with  the 
distribution  of  phosphate- adaptation  genes.  If  genome  size  were  linked  to  adaptation  to  P 
limitation,  we  would  expect  strains  with  larger  complements  of  P-acquisition  genes  to 
have  smaller  genomes,  but  no  such  correlation  is  observed  among  strains  sequenced  to 
date.  The  hypervariable  10%  of  the  genome  represents  approximately  4.5%  of  the 
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cellular  P  budget  that  could  be  considered  the  cell’s  ‘investment’  in  gene  exchange. 
Metagenomic  sequence  data  from  the  HOT  and  BATS  ocean  time-series  stations  also 
provides  evidence  that  adaptation  to  strong  P  limitation  at  BATS  has  not  produced  a 
decrease  in  genome  size.  If  genome  size  reduction  were  a  strategy  to  adapt  to  nutrient 
limitation,  we  would  expect  many  genes  to  be  less  abundant  at  the  more  strongly  P- 
limited  site.  In  fact,  the  opposite  pattern  is  observed:  core  genes  (i.e.,  those  shared  by  all 
sequenced  Prochlorococcus  strains;  Kettler  et  al.,  2007)  show  indistinguishable  per-cell 
abundance  distributions  at  HOT  and  BATS,  and  of  the  29  flexible  genes  that  are 
differentially  abundant,  26  are  actually  more  abundant  at  BATS  than  HOT  (Coleman  and 
Chisholm,  submitted). 

If  nutrient-utilization  efficiency  fails  to  account  for  the  evolution  of  small,  A-i-T-rich 
genomes  in  Prochlorococcus,  what  processes  offer  a  more  satisfactory  explanation? 
Partensky  and  Garczarek  (2010)  discuss  the  loss  of  several  DNA-repair  genes  within  the 
Prochlorococcus  lineage,  and  note  that  the  inactivation  of  these  systems  is  likely  to 
increase  the  rate  of  G:C-to-A:T  mutations.  These  authors  also  point  out,  however,  that 
gene  losses  are  not  clearly  linked  to  high  mutation  rates,  as  that  should  result  in  the 
presence  of  numerous  inactivated  pseudogenes,  which  are  nearly  absent  in 
Prochlorococcus  genomes.  Quantification  of  mutation  rates  in  various  Prochlorococcus 
ecotypes  would  be  highly  valuable  in  assessing  the  role  of  DNA  repair  in  genome 
streamlining. 

With  regard  to  gene  loss.  Lynch  (2006)  has  suggested  that  the  presence  of  nonfunctional 
DNA  in  bacterial  genomes  is  opposed  by  selection  because  of  the  mutational  burden  it 
introduces.  For  example,  a  mutation  in  a  5’  untranslated  region  could  introduce  an 
alternative  start  site  for  its  associated  ORF,  with  deleterious  consequences.  The  strength 
of  the  selection  that  opposes  the  maintenance  of  such  sequences  is  proportional  to  the 
effective  population  size,  which  as  noted  above,  is  difficult  to  discern  at  present.  In 
organisms  such  as  Prochlorococcus,  it  would  appear  that  not  only  nonfunctional  DNA, 
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but  also  coding  regions  have  been  excised.  This  may  be  tied  to  reductions  in  cell  size  and 
metabolic  simplification:  as  overall  expression  levels  drop,  some  genes  enter  a  ‘gray 
zone’  between  functional  and  nonfunctional  DNA.  A  gene  that  is  no  longer  (or  not  yet) 
expressed  may  be  a  waste  of  a  couple  of  thousand  phosphorus  atoms,  but  its  much  greater 
burden  is  as  a  neighbor  to  genes  that  do  need  to  be  expressed  efficiently  and  need  a  well- 
ordered  chromosomal  neighborhood  to  do  so.  This  ‘use  it  or  lose  it’  notion  of  genome 
streamlining  is  also  more  consistent  with  the  clustering  of  recently- acquired  genes  in 
hypervariable  islands  (Coleman  et  ah,  2006),  as  islands  could  serve  as  genomic  ‘holding 
pens’  that  allow  active  gene  exchange  without  undue  disruption  of  expression  of  the  rest 
of  the  chromosome.  Ultimately,  genome  streamlining  in  Prochlorococcus  (and  other 
marine  bacteria)  may  well  be  tied  to  oligotrophic  conditions  -  but  through  overall 
reductions  in  gene  expression  (as  noted  in  Section  1,  the  major  nutrient  demand  on  the 
cell)  rather  than  selection  on  the  composition  of  DNA  per  se.  Furthermore,  this  scenario 
unifies  the  root  causes  of  genome  streamlining  in  parasitic  and  oligotrophic  bacteria,  as 
both  could  stem  from  genetic  disuse.  Much  remains  to  be  explored  regarding  the 
evolutionary  trajectories  of  Prochlorococcus  genomes,  and  multiple  driving  forces  are 
likely  to  be  at  work.  Nevertheless,  the  ability  of  population-genomic  processes  to  explain 
genome  characteristics  of  a  wide  variety  of  organisms  inhabiting  many  different  habitats 
(Lynch,  2006)  offers  promise  for  a  deeper  understanding  of  microbial  evolution. 

3.  Stochastic  gene  expression  efeects  and  the  ecological  consequences  of 

NOISE 

3.1  The  nature  of  noise 

The  expression  of  genes  by  cells  is  a  stochastic  phenomenon.  Gene  expression  is 
stochastic  in  that  fluctuations  in  gene  product  abundances,  both  in  a  particular  cell  over 
time  and  between  nominally  identical  cells  at  a  given  time,  are  governed  by  processes, 
such  as  macromolecular  diffusion,  that  are  random  and  can  only  be  described  statistically 
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(Kaern  et  al.,  2005;  Larson  et  al.,  2009).  Note  that  this  does  not  mean  that  expression 
levels  as  typically  considered  -  as  population  or  steady-state  averages  -  are  random  or 
non-deterministic.  Regulatory  mechanisms  clearly  control  the  averages  and  central 
tendencies;  it  is  in  the  variance  and  high-frequency  dynamics  that  stochastic  effects 
become  significant.  For  individual  cells,  however,  long-term  or  population  averages  are 
irrelevant,  and  stochasticity  can  have  a  major  impact  on  how  a  cell  goes  about 
accomplishing  a  physiological  programme  such  as  cell  division. 

To  illustrate  this  idea,  consider  the  abundance  of  a  given  gene’s  transcript  in  a 
hypothetical  Prochlorococcus  cell.  As  noted  above  (Section  1.2.1),  only  about  12%  of 
the  cell’s  genome  is  transcribed  at  any  given  time,  so  the  mean  transcript  abundance 
(averaged  across  all  genes  and  cells)  is  0.12  copies/cell.  Of  course,  transcripts  only  exist 
in  integer  copy  numbers,  so  this  result  is  more  meaningfully  interpreted  along  the  lines 
of,  “A  given  cell  contains  a  copy  of  a  given  transcript  about  12%  of  the  time,”  or,  “A 
given  transcript  is  present  in  about  12%  of  cells  in  a  population  at  any  given  time.”  In 
reality,  there  is  a  wide  range  of  average  transcript  abundances  (spanning  four  orders  of 
magnitude,  according  to  the  AMEX  results  in  Chapter  4),  so  some  transcripts  are 
probably  found  in  most  cells  most  of  the  time  (average  abundances  at  or  above  1),  while 
others  are  very  rare  (-0.001  copies/cell).  Nevertheless,  the  fact  that  the  average 
transcript  abundance  is  significantly  less  than  1  copy /cell  means  that  each  individual 
transcription  event  (i.e.,  each  mRNA  molecule  produced)  usually  causes  a  huge 
proportional  change  in  the  abundance  of  that  gene  product  in  the  cell.  Since  transcription 
is  a  stochastic  process,  the  low  average  abundance  of  transcripts  in  Prochlorococcus  cells 
mean  that  the  actual  number  of  copies  of  a  particular  mRNA  molecule  in  a  given  cell  will 
be  subject  to  strong  and  unpredictable  fluctuations  over  time.  As  a  cell  tries  to  perform 
more  or  less  of  a  certain  metabolic  process  (e.g.,  in  response  to  changing  light  availability 
over  the  diel  cycle)  by  changing  the  expression  of  a  related  gene,  it  must  cope  with  the 
random  noise  imposed  by  stochastic  effects  on  the  gene  product  abundance  is  it 
attempting  to  regulate. 
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The  small  size,  limited  chemical  complements  and  low  absolute  gene  expression  levels  of 
Prochlorococcus  cells  could  have  significant  implications  for  their  physiological 
functioning.  In  particular,  small  cells  might  ‘sample’  the  range  of  physiological  states 
(phenotypes)  available  to  them  in  a  more  discrete  fashion  than  larger  cells  do,  because  the 
pools  of  biochemicals  that  enable  those  different  states  are  smaller  and  thus  integer 
changes  in  copy  numbers  of  gene  products  are  proportionally  more  significant.  This 
‘granularity’  in  gene  product  abundance  makes  small  cells  especially  susceptible  to 
stochastic  effects  in  gene  expression.  Stochasticity  can  give  rise  to  phenotypic  variability 
even  within  an  isogenic  population,  with  various  aspects  of  cell  physiology  and 
metabolism  varying  around  a  population  mean  (Longo  and  Hasty,  2006).  In  large 
populations,  then,  rare  phenotypes  are  more  likely  to  arise  and  have  an  ecological,  and 
potentially  evolutionary,  impact.  Nutrient  scarcity  can  also  exacerbate  stochastic  effects, 
particularly  for  autotrophic  organisms,  since  gene  products  (mRNA  and  proteins)  and 
gene  expression  machinery  (especially  ribosomes)  are  the  most  nutrient-intensive 
biochemicals  in  the  cell.  Prochlorococcus,  as  one  of  the  smallest  and  most  abundant 
free-living  bacteria  and  an  inhabitant  of  exceptionally  nutrient-poor  habitats,  appears  to 
be  in  the  ‘perfect  storm’  of  stochastic  gene  expression  effects.  Here  we  develop  a 
hypothesis  for  how  Prochlorococcus  -  and  small  cells  in  general  -  might  cope  with 
expression  noise,  which  suggests  that  modulating  these  effects  could  be  an  important 
constraint  on  these  organisms’  physiology  and  ecology. 

Stochastic  effects  in  gene  expression  (also  termed  ‘noise’)  have  been  of  interest  for  a 
number  of  years,  and  a  variety  of  experimental  and  theoretical  studies  have  been 
published.  The  significance  of  stochastic  effects  in  processes  such  as  determination  of 
cell  fate  (Losick  and  Desplan,  2008)  and  responses  to  environmental  perturbations  (Acar 
et  ah,  2008)  been  demonstrated.  The  exploration  of  stochastic  effects  in 
Prochlorococcus,  however,  is  hindered  by  the  lack  of  genetic  tools  that  would  enable  the 
kind  of  single-cell  studies  that  have  elucidated  similar  phenomena  in  other  microbes. 
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Fluorescent  protein  fusions  have  been  especially  powerful  tools  for  visualizing  stochastic 
effects,  and  allow  detection  by  flow  cytometry  (Newman  et  ah,  2006).  Transfer  of  a 
GFP-containing  plasmid  into  (and  its  expression  by)  one  Prochlorococcus  strain  has  been 
acheived  (Tolonen  et  al.,  2006),  but  expressed  fusions  to  native  proteins  have  yet  to  be 
demonstrated.  While  direct  experimental  investigation  of  noisiness  in  Prochlorococcus 
gene  expression  awaits  further  developments,  the  cellular  biochemical  budgets  outlined 
in  the  first  section  allow  us  to  bound  the  problem  and  generate  some  useful  hypotheses 
for  how  such  effects  might  be  manifested. 

3.2  Bounding  the  noisiness  of  Prochlorococcus 

We  consider  two  different  scenarios  for  gene  expression  noise,  depending  on  the  noise 
source:  a  “Poisson”  scenario,  where  expression  noise  is  dominated  by  the  random  births 
and  deaths  of  mRNA  molecules,  and  a  “Telegraph”  scenario,  where  noise  is  dominated 
by  random  promoter  binding  events  (Kaufmann  and  van  Oudenaarden,  2007).  The 
Poisson  scenario  is  a  one-state  model  in  that  the  probability  of  transcription  is  constant 
with  time.  The  Telegraph  scenario  is  a  two-state  model,  since  transcription  is  dependent 
on  promoter  binding,  which  is  itself  stochastic.  While  the  Poisson  model  has  generally 
been  considered  to  sufficiently  explain  observations  of  expression  fluctuations  in  bacteria 
(Raj  and  van  Oudenaarden,  2008),  there  is  some  evidence  for  the  transcriptional  bursting 
implied  by  the  Telegraph  model  (Golding  and  Cox,  2006),  so  here  we  consider  both 
possibilities.  The  equations  describing  mRNA  and  protein  noise  under  the  two  scenarios 
are  given  in  Table  2  (Kaufmann  and  van  Oudenaarden,  2007),  and  we  introduce  the  ratio 
of  the  noise  at  the  mRNA  and  protein  levels,  designated  H  . 

A  description  of  each  of  the  noise  parameters,  and  their  values  for  Prochlorococcus  near 
the  upper  and  lower  limits  of  environmentally-observed  growth  rates  (Veldhuis  et  al., 
2005),  are  given  in  Table  3.  Where  required,  values  from  the  budgets  in  Section  1  of  this 
chapter  were  taken,  using  case  II.  Note  that  because  of  the  way  gene  product  abundances 
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Table  2.  Equations  for  noise  in  gene  expression  at  the  mRNA  and  protein  levels,  from  Kaufman  & 
van  Oudenaarden  (2007).  In  the  Poisson  scenario,  noise  is  dominated  by  the  random  births  and 
deaths  of  low-copy  transcripts.  In  the  Telegraph  scenario,  transcriptional  bursts  result  from 
random  promoter  binding  events.  See  Table  3  for  parameter  values. 


Prochlorococcus  ‘  E.  coli " 


growth  rate 

0.3  day ' 

0.7  day ' 

0.6  hr' 

3.0  hr 

average  mRNA  abundance  (copies/cell)  ‘ 

<m> 

0.14 

0.14 

0.25 

1.06 

average  protein  abundance  (copies/cell)  “ 

<p> 

338 

338 

560 

1745 

transcription  rate  (min  ')  ^ 

0.068 

0.057 

0.077 

0.262 

translation  rate  (min  ') ' 

A 

0.36 

0.84 

16.48 

57.39 

mRNA  half-life  (min)  ^ 

'^m 

2 

2.4 

3.2 

4.0 

promoter  off-rate  (min  ') '' 

b 

0.68 

0.57 

0.77 

2.62 

Notes 

a:  Prochlorococcus  cellular  compositions  from  budgets  (Sec.  1) 

b:  E.  coli  data  from  Bremer  and  Dennis  (2008)  (BD08),  unless  otherwise  noted 

c:  From  total  mRNA/cell  (Pro:  from  budgets;  £  coli:  BD08),  number  of  genes  in  genome  and  average  gene  length 
d:  From  total  protein/cell  (Pro:  from  budgets;  E.  coli:  BD08),  number  of  genes  in  genome  and  average  protein  length 
e:  From  steady-state  condition,  k=<m>lT„, 

f:  From  peptide  elongation  rate  (using  growth  rate;  BD08  Eq.  18),  average  gene  length  and  ribosome  spacing  (Pro:  from  budgets,  E.  coli:  BD08) 
g:  Pro:  Steglich  et  al.,  in  prep;  f.  coli:  Bernstein  et  al.,  2004,  with  growth-rate  dependence  from  BD08 
h:  Assume  rapid  &  reversible  promoter  binding  (Raser  &  O'Shea  2004),  /j=10A,„ 


Table  3.  Values  of  cellular  parameters  used  in  calculations  of  expression  noise. 
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and  rates  are  calculated  here,  stochastic  effects  for  Prochlorococcus  are  independent  of 
cell  stoichiometry  at  a  given  growth  rate.  This  is  certainly  a  simplification,  and  in  real 
settings  there  are  likely  tradeoffs  between  growth,  noise  and  budgets  that  are  important  to 
environmental  adaptation.  To  help  put  the  Prochlorococcus  results  in  context,  values  of 
the  same  parameters  for  E.  coli  growing  at  two  different  rates  are  also  shown  (Table  3). 

As  expected  from  its  slower  growth  and  lower  ribosome  content,  the  translation  rate  in 
Prochlorococcus  is  substantially  lower  than  in  E.  coli. 

Under  both  the  Poisson  and  Telegraph  scenarios,  gene  expression  is  noisier  in 
Prochlorococcus  than  E.  coli  at  the  mRNA  level,  but  less  noisy  at  the  protein  level  (Table 
4).  For  both  organisms,  noise  scales  inversely  with  growth  rate,  principally  due  to  the 
larger  pools  of  gene  products  and  expression  components  in  the  faster-growing  cells. 
Stochastic  effects  due  to  mRNA  fluctuations  (the  Poisson  scenario)  are  larger  than  those 
predicted  for  promoter  noise  (the  Telegraph  scenario),  though  this  is  contingent  on 
promoter  kinetics  that  likely  vary  widely  between  individual  genes. 

A  key  prediction  of  this  model  is  that  gene  expression  in  Prochlorococcus  is  much 
noisier  at  the  mRNA  level  -  probably  by  a  factor  of  over  a  thousand  -  than  at  the  protein 
level.  Values  of  the  mRNA:protein  noise  ratio  {H  value)  in  Prochlorococcus  range  from 
near  700  in  the  Telegraph  promoter-noise  scenario  at  fast  growth  rates  to  over  3000  in  the 
Poisson  scenario  at  slow  growth  rates.  By  comparison,  the  noise  ratios  in  E.  coli  range 
from  ~6  up  to  only  43.  The  main  factors  contributing  to  the  high  noise  ratio  in 
Prochlorococcus  are  the  small  numbers  of  transcripts  and  slow  translation  rate.  This 
results  in  fewer  proteins  being  produced  per  transcript,  limiting  the  effects  of  translational 
bursting.  In  fact,  the  model  presented  here  implies  that  many  transcripts  in 
Prochlorococcus  may  go  untranslated,  essentially  for  want  of  an  available  ribosome. 
Since  transcription  (and  therefore  transcript  degradation)  is  not  slowed  to  nearly  the  same 
degree  as  translation,  a  sizeable  proportion  of  transcripts  might  be  recycled  before  a 
ribosome  binds  to  them.  While  ‘wasteful’  to  a  certain  extent,  Prochlorococcus  invests 
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Prochlorococcus  E.  coli 


growth  rate 

0.3  day ' 

0.7  day  ’ 

0.6  hr' 

3.0  hr' 

Poisson  scenario 

mRNA  noise 

7.35 

7.35 

4.07 

0.95 

protein  noise 

Vp 

0.0021 

0.0060 

0.0942 

0.1326 

Noise  ratio 

if 

3453 

1233 

43 

7.1 

Telegraph  scenario 

mRNA  noise 

0.424 

0.424 

0.289 

0.087 

protein  noise 

Vp 

0.0002 

0.0006 

0.0094 

0.0133 

Noise  ratio 

if 

1990 

711 

30.7 

6.5 

Table  4.  Results  of  expression  noise  modeling,  using  equations  and  values  from  Tables  2  and  3. 
Under  both  noise  scenarios  and  over  the  full  range  of  growth  rates,  gene  expression  in  Prochlorococ¬ 
cus  is  noisier  at  the  mRNA  level  but  less  noisy  at  the  protein  level  than  in  £  coli.  Protein  noise  is 
effectively  damped  by  the  slow  translation  rate  in  Prochlorococcus. 
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only  a  relatively  small  proportion  of  nutrients  in  mRNA.  High  mRNA  turnover  could  be 
energy-intensive,  but  if  the  phenotypic  consequences  of  protein-level  noise  are  severe 
enough,  it  may  be  worthwhile  to  express  genes  in  a  manner  that  damps  stochastic 
fluctuations  in  protein  abundances. 

If  protein-level  expression  noise  is  in  fact  a  constraint  on  cellular  metabolism,  the 
equations  in  Table  2  outline  several  potential  strategies  for  dealing  with  it.  Perhaps  most 
straightforward  is  to  increase  <p>  —  that  is,  to  make  larger  pools  of  proteins.  This,  of 
course,  requires  greater  nutrient  supplies  and  larger  cells,  and  would  be  ecologically 
disadvantageous  in  an  oligotrophic  marine  habitat.  Another  strategy  is  to  decrease 
mRNA  half-life  (smaller  t^).  Prochlorococcus  has  quite  short  mRNA  half-lives 
(Steglich  et  ah,  submitted),  but  it  seems  likely  that  this  strategy  can  only  be  taken  so  far, 
as  half-lives  would  become  too  short  for  mRNAs  to  be  translated  at  all.  The  third  option 
is  to  slow  down  protein  synthesis  (smaller  Xp)  -  which  can  be  taken  as  far  as  the 
ecological  consequences  of  the  concomitant  slowdown  in  cell  growth  can  be  tolerated. 
The  damping  of  protein  noise  by  slow  translation  (in  the  sense  of  low  protein  yield  per 
transcript  molecule,  not  necessarily  due  to  “slow  ribosomes”)  may  be  a  substantial  factor 
in  the  low,  fixed  cell  division  rate  of  Prochlorococcus. 

It  is  interesting,  then,  that  Prochlorococcus  makes  relatively  little  use  the  first  noise¬ 
damping  option  -  making  more  protein  and  bigger  cells  -  even  when  nutrients  are 
abundant.  It  is  clear  that  some  “facultatively  copiotrophic”  (or  conversely,  “facultatively 
oligotrophic”)  microbes  can  tune  cellular  protein  abundances,  sizes  and  growth  rates  to 
the  prevailing  nutrient  regimes,  making  smaller,  slow-growing  cells  under  oligotrophic 
conditions  and  larger,  faster-growing  ones  when  nutrients  are  replete.  It  may  be  that 
nutrient  enrichments  to  epipelagic  waters  are  infrequent  enough  that  Prochlorococcus 
lost  the  genetic  machinery  to  speed  growth  under  meso-  to  eutrophic  conditions  through 
disuse,  as  described  in  the  previous  section.  Not  all  open-ocean  microbes  have  lost  these 
abilities,  however  (Martin  and  Macleod,  1984),  suggesting  that  the  putatively  large 
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effective  population  size  of  Prochlorococcus  may  have  driven  it  further  down  the  path  of 
genome  streamlining. 

The  relative  lifetimes  of  proteins  and  transcripts  may  also  be  crucial  to  expression  noise. 
Steglich  et  al.  (submitted)  have  shown  that  mRNA  half-lives  in  Prochlorococcus  are 
relatively  short,  and  the  timecourse  expression  results  presented  in  Chapter  4  suggest 
(though  do  not  require)  that  protein  lifetimes  are  relatively  long.  This  Tpltm  ratio  figures 
prominently  in  several  more  recent  models  of  stochastic  gene  expression.  In  the  models 
of  Pedraza  and  Paulsson  (2008)  and  Shahrezaei  and  Swain  (2008),  which  treat  bursting 
effects  explicitly  in  a  temporal  framework,  the  contribution  of  mRNA  noise  to 
stochasticity  at  the  protein  level  is  modulated  by  Tplxm-  In  fact  (cf.  Pedraza  and  Paulsson, 
2008,  Eq.  1  and  2;  Shahrezaei  and  Swain,  2008,  Eq.  21),  in  the  case  where  Tp»  Xm,  as 
may  well  be  the  case  for  Prochlorococcus,  protein  noise  in  these  models  simplifies  to 
l/<p>,  the  inherent  small-number  lower  limit.  Temporally-resolved  proteomics 
measurements,  akin  to  those  presented  in  Chapter  4,  should  allow  gene-product  lifetime 
experiments  analogous  to  that  of  Steglich  et  al.  to  be  performed  at  the  protein  level, 
providing  a  global  view  of  XptXm  ratios  for  Prochlorococcus  genes.  This  combination  of 
transcript  and  protein  half-life  measurements  would  test  whether  the  combination  of 
long-lived  proteins  and  rapid  mRNA  turnover  contributes  to  noise  reduction,  and  how 
this  effect  varies  among  different  genes. 

3.3  The  growth  costs  of  noise,  or,  You  can’t  just  shrink  a  Ferrari 

The  low  translation  rate  that  contributes  to  limiting  protein-level  expression  noise  in 
Prochlorococcus  goes  hand-in-hand  with  relatively  slow  cell  growth.  Slow  growth  may 
be  a  general  feature  of  very  small  cells,  if  low  translation  rates  are  a  common  strategy  to 
damp  protein  noise  and  stabilize  metabolism.  To  apply  an  automotive  metaphor,  in  small 
cells,  the  ‘engine’  of  transcription  is  running  fast,  producing  (and  degrading)  mRNA  at 
rates  comparable  to  faster-growing  cells,  but  the  ‘transmission’  of  translation  is  in  low 
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gear,  producing  the  ‘torque’  required  to  control  the  cell  division  program  at  the  expense 
of  slowing  protein  production.  In  this  scenario,  small  cells  grow  slowly  for  the  same 
reason  a  moped  can’t  go  60  mph  up  a  steep  hill:  the  small  engine  can  only  generate  so 
much  torque  with  the  largest  gear  that  will  fit  on  the  frame.  As  with  motor  vehicles,  there 
may  be  a  size- speed  tradeoff  in  cells:  initial  increases  in  size  produce  speed  (growth  rate) 
gains,  as  a  more  powerful  drivetrain  (expression  machinery)  can  be  accommodated  by  the 
frame.  But  eventually,  an  optimum  power:weight  ratio  is  reached,  maximizing  speed,  as 
exemplified  by  racecars  or  E.  coli.  Above  this  optimum  size,  top  speed/maximum  growth 
rate  declines  (for  bigger  ‘vehicles’,  e.g.,  cargo  trucks,  or  large  protists  such  as 
dinoflagellates)  as  the  chassis  accommodates  heavier  and  heavier  loads. 

There  is  biological  evidence  for  the  growth  rate-cell  mass  relationship  on  both  sides  of 
this  proposed  optimum.  In  E.  coli,  growth  rate  increases  proportionally  to  cell  mass  up  to 
a  maximum  of  a  20  minute  doubling  time  at  a  mass  of  1.02  pg  (Bremer  and  Dennis, 

2008).  Interestingly,  this  positive  correlation  between  cell  size  and  growth  rate  is  the 
opposite  of  the  relationship  proposed  from  some  compilations  of  phytoplankton  growth 
rates,  which  have  been  fit  with  a  (mass)'^^"^  trend  (Finkel  et  ah,  2009);  it  should  be  noted 
that  positive  correlations  between  cell  size  and  growth  rate  have  been  also  observed  for 
some  phytoplankton  (Costello  and  Chisholm,  1981).  A  negative  correlation  between 
growth  and  size  in  larger  organisms  has  been  justified  theoretically  in  the  context  of  the 
metabolic  theory  of  ecology  (Brown  et  ah,  2004).  So  one  hypothesis,  shown 
diagrammatically  in  Figure  6,  might  be  that  rapidly-growing  bacteria  such  as  E.  coli  are 
perched  at  an  optimum  at  the  border  between  metabolically-dominated  and 
stochastically-dominated  growth  regimes.  In  this  view,  the  increase  in  growth  rate  gained 
by  alleviating  metabolic  burdens  in  smaller  cells  is  counterbalanced  by  the  challenge  of 
speeding  up  a  transcriptional  program  whose  smaller  pools  of  gene  products  are  more 
susceptible  to  stochastic  effects.  Testing  this  hypothesis  would  require  intercalibration  of 
growth  rate-cell  size  relationships  across  a  broad  range  of  taxa  (especially  small,  slow- 
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log  cell  mass 


Figure  6.  Schematic  of  hypothesized  relationship  between  cell  mass  and  growth  rate.  At  low  cell 
mass,  small  pools  of  gene  products  result  in  large  stochastic  effects  in  gene  expression;  slow  growth 
is  suggested  as  one  way  to  damp  these  effects  at  the  protein  level  and  thereby  maintain  metabolic/ 
phenotypic  stability.  For  larger  cells,  stochastic  effects  are  small  and  growth  rates  are  limited  primarily 
by  metabolic  considerations .  Fast-growing  bacteria  such  as  £  coli  are  proposed  to  live  at  the  border 
between  the  two  growth-limitation  regimes. 
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growing  bacteria  and  archaea),  which  is  challenging  (Finkel  et  ah,  2009).  But  one  can’t 
simply  shrink  down  a  Ferrari  and  expect  it  to  still  go  as  fast. 

While  stochastic  gene  expression  effects  might  be  observable  in  the  laboratory  (though 
not  yet  in  Prochlorococcus)  and  at  least  partially  predictable  on  theoretical  grounds,  it  is 
less  clear  how  consequential  they  are  for  organismal  physiology  and  even  less  clear  for 
ecology.  Experimental  studies  to  date  have  shown  that  increased  stochasticity  can 
provide  an  advantage  when  environmental  conditions  fluctuate.  With  the  notable 
exception  of  the  diel  cycle  (Chapter  4),  the  tropical/subtropical  open-ocean  habitat  of 
Prochlorococcus  is  generally  considered  one  of  the  more  stable  environments  on  earth. 
Physical  and  chemical  conditions  in  the  upper  water  column  of  subtropical  gyres  change 
relatively  slowly,  and  seasonality  is  weak.  In  recent  years,  however,  increasing  attention 
has  been  paid  to  smaller-scale  variability  that  may  be  highly  relevant  for  microbial 
physiology.  Even  in  the  well-mixed  upper  open  ocean,  eddies  and  aeolian  deposition 
events  produce  substantial  changes  in  nutrient  availability  over  scales  of  10-10  m  and 
days  to  weeks.  Perhaps  more  importantly  over  longer  time  scales,  even  conditions  that 
appear  stable  using  conventional,  ‘bulk’  oceanographic  measurements  are  increasingly 
appreciated  as  highly  heterogeneous  on  the  spatial  scales  of  individual  microbial  cells 
(Seymour  et  ah,  2004;  Mitchell  et  ah,  2008).  Microscale  events  such  as  the  formation 
and  dissipation  of  nutrient  patches  or  the  aggregation  and  sinking  of  rare,  large  particles 
are  occurring  constantly  and  represent  environmental  fluctuations  from  the  perspective  of 
individual  cells.  The  ecological  (and  ultimately  evolutionary)  consequences  of  stochastic 
gene  expression  in  planktonic  marine  microbes  may  well  be  tied  to  the  exploitation  of 
microscale,  transient  features  in  the  water  column. 

4.  Conclusions  and  future  directions 

This  chapter  presents  a  simple  model  for  the  molecular  composition  of  hypothetical  high¬ 
light  Prochlorococcus  cells,  based  on  available  experimental  and  genomic  data. 
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Budgeting  of  cellular  contents  of  carbon,  nitrogen  and  phosphorus  reveals  that  almost 
half  of  cellular  P  is  contained  in  the  chromosome,  leaving  enough  for  only  about  600 
ribosomes  and  the  transcription  of  10-15%  of  the  genome.  70-90%  of  cellular  N  is  in 
protein,  and  copy  numbers  of  individual  proteins  range  up  to  perhaps  50,000  copies  per 
cell.  We  suggest  that  genome  streamlining  is  unlikely  to  be  driven  primarily  by 
adaptation  to  oligotrophic  conditions,  and  hypothesize  that  population-level  evolutionary 
processes  are  more  likely  to  explain  genome  evolution  in  Prochlorococcus. 

Consideration  of  cell  composition  in  light  of  stochastic  effects  in  gene  expression  yields  a 
prediction  that  transcript  levels  in  Prochlorococcus  cells  are  likely  to  be  very  noisy,  but 
that  noise  is  strongly  damped  at  the  protein  level  due  to  slow  translation.  Slow 
translation  is  concomitant  with  slow  growth,  and  we  propose  a  hypothesis  for  growth 
limitation  of  small  cells  by  the  need  to  limit  the  metabolic  and  phenotypic  influence  of 
stochastic  gene  expression. 

There  is  a  large  gulf  in  microbial  biogeochemistry  between  the  rapidly-expanding 
inventory  of  microbial  diversity  and  knowledge  of  the  molecular  composition  of 
microbial  communities.  A  very  few  organisms,  notably  E.  coli  and  S.  cerevisiae,  have 
been  the  subject  of  extensive  quantitative  investigation.  While  these  lab  models  are  of 
limited  environmental  relevance  in  themselves,  they  have  provided  essential  data  on  the 
molecular  organization  of  universal  biological  structures,  such  as  ribosomes  and 
membranes.  Armed  with  such  parameters,  and  with  some  basic  information  about  cell 
size  and  elemental  composition  -  and  a  sequenced  genome  -  one  can  actually  constrain 
the  molecular  makeup  of  any  microorganism  to  a  reasonably  informative  degree.  For  a 
cultivable  organism  like  Prochlorococcus,  more  physiological  and  biochemical  data  are 
available,  enabling  greater  refinement  of  a  cell  composition  model.  Any  natural 
microbial  population  is  comprised  of  a  wide  variety  of  organisms,  ranging  in 
characterization  from  undiscovered  to  cultivated,  and  in-depth  investigation  of  the 
properties  of  each  and  every  relevant  microbial  type  is  unfeasible.  A  more  viable  goal  is 
to  build  conceptual  framework  to  ‘bootstrap’  the  information  about  the  composition  of 
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better-characterized  organisms  into  inferences  about  less  well-known  members  of  the 
community. 

Ultimately,  we  would  like  to  be  able  to  use  data  from  high-throughput,  culture- 
independent  techniques  as  inputs  to  biogeochemical  models  of  microbial  communities. 
Such  techniques  are  already  available  to  measure  some  aspects  of  cellular  composition: 
flow  cytometry,  for  example,  can  count  cells  and  detect  a  variety  of  pigments,  and  with 
the  right  sort  of  dye  or  probe  can  assay  for  a  huge  range  of  properties,  and  can  examine 
thousands  of  cells  per  second.  Nucleic  acid  sequence  information,  of  course,  is  now  a 
readily  observed  property  of  any  microbial  system,  and  can  even  be  obtained  from  single, 
uncultivated  cells  (Rodrigue  and  Malmstrom  et  ah,  2009).  Chemical  imaging  methods 
such  as  SIMS  (secondary  ion  mass  spectrometry)  and  X-ray  microanalysis  are 
increasingly  used  to  examine  the  makeup  of  microbial  cells,  and  high-speed  cell  sorting 
and  microfluidic  devices  promise  new  ways  of  preparing  environmental  samples  for 
molecular  analysis.  The  real  benefit  of  a  compositional  view  of  cells  will  come  in 
integrating  information  from  these  disparate  technologies,  identifying  key  unknowns  for 
further  investigation,  and  feeding  into  process  models  of  microbial  activity  and  its 
biogeochemical  consequences.  Biogeochemistry  that  is  informed  by  systems  biology 
will  undoubtedly  afford  deeper  insights  into  the  structure,  function  and  coevolution  of 
earth’s  life  and  environments. 
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The  work  presented  in  this  thesis  has  sought  to  contribute  to  understanding  of  how,  when 
and  why  biological  activity,  especially  primary  production  by  oxygenic  photoautotrophs, 
has  driven  the  secular  oxidation  of  earth’s  surface  environment.  Using  a  variety  of 
molecular  tools  and  examining  processes  at  timescales  spanning  hours  to  billions  of 
years,  the  goal  of  these  studies  has  been  to  provide  new  insights  into  the  biogeochemistry 
of  marine  microbes,  both  as  documented  in  the  rock  record  from  the  distant  past  and 
visible  in  the  activities  of  living  organisms.  The  principal  findings  include: 

•  The  molecular  fossil  record  bears  clear  evidence  of  a  diverse  microbial 
community  in  the  oceans  by  at  least  2.7  billion  years  ago,  including  members 
of  all  three  domains  of  life. 

•  The  presence  of  biomarkers  for  aerobic  biochemistry  long  before  the 
oxygenation  of  the  atmosphere  is  fully  consistent  with  their  production  in 
persistently  microaerobic  regions  of  the  surface  ocean. 

•  The  diel  cell  cycle  of  Prochlorococcus  and  associated  shifts  in  metabolic 
fluxes  are  underlain  by  small  variations  in  a  proteome  whose  composition  is 
generally  stable,  despite  strong  oscillations  in  expression  at  the  transcript  level. 

•  The  relatively  simple  cellular  and  biochemical  architecture  of  Prochlorococcus 
enables  a  compositional  budget  to  be  constructed  of  the  sort  that  can  inform 
biogeochemical  models,  as  well  as  guide  inferences  about  the  ecological 
consequences  of  genome  streamlining  and  stochastic  gene  expression  in  small 
cells. 
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Future  directions:  the  molecular  record  of  primary  production  and 

OXIDATION 

The  work  presented  here  has  helped  to  place  the  late  Archean  biomarker  record  on  firmer 
footing,  but  much  remains  to  be  learned  about  the  molecular  fossils  contained  in  the  vast 
and  still-growing  archive  of  material  recovered  by  the  Agouron  drilling  project  and  other 
scientific  drilling  initiatives.  In  particular,  it  would  be  helpful  to  characterize  and  identify 
the  controls  on  finer  spatial-scale  variability  in  biomarker  compositions,  and  map 
correlations  with  other  geochemical  proxies  in  greater  detail.  While  the  low  extractable 
hydrocarbon  contents  of  these  overmature  rocks  make  high-resolution  measurements 
quite  challenging,  an  understanding  of  such  distributions  would  both  lend  greater 
confidence  to  the  molecular  fossils’  syngeneity  and  likely  provide  new  insights  into  the 
ecological  and  biogeochemical  structure  of  the  microbial  community. 

The  late  Archean  Hamersley  and  Transvaal  basins,  on  which  much  Precambrian 
biomarker  work  has  focused,  are  essentially  the  end  of  the  line:  no  older  sedimentary 
rocks  of  equivalently  low  metamorphic  grade  (such  that  biogeochemically-informative 
molecular  fossils  are  likely  to  have  been  preserved)  have  yet  been  found.  Pushing  the 
molecular  fossil  record  deeper  into  the  Archean  will  be  a  considerable  challenge,  and 
likely  contingent  on  both  methodological  advances  and  new  subsurface  sampling  efforts. 
But,  having  jumped  as  far  back  as  possible,  it  may  be  fruitful  to  turn  back  towards  the 
Proterozoic,  during  which  time  both  the  redox  structure  of  the  atmosphere  and  oceans  and 
the  ecology  of  primary  producers  underwent  repeated  reorganizations.  Establishing  a 
more  global,  continuous  molecular  fossil  record  for  the  Paleo-  and  Mesoproterozoic  will 
help  to  clarify  the  biogeochemical  and  evolutionary  transitions  involved  in  the  long 
journey  from  the  late  Archean  to  the  dawn  of  the  modern  Phanerozoic  earth  system. 
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Future  directions:  molecular  networks  in  biogeochemistry 

It  is  clear  that  the  tools  of  systems  biology  can  open  new  windows  on  biogeochemistry; 
the  two  fields  share  a  desire  to  unpack  the  interlocking,  nested,  overlapping  molecular 
interactions  in  complex  networks  while  keeping  sight  of  how  those  networks  produce 
first-order  emergent  phenomena  like  cell  growth  and  net  primary  production.  Likewise, 
biogeochemistry  can  offer  systems  biology  a  framework  for  understanding  the  real 
environmental  conditions  that  biological  systems  respond  to,  where  all  variables  are 
continuous  rather  than  binary.  Much  fruitful  cross-fertilization  seems  possible  between 
the  two  fields.  For  instance,  an  integrative  transcriptomic/proteomic/metabolomic/ 
biochemical/regulatory  investigation  of  central  carbon  metabolism  in  Prochlorococcus 
would  illustrate  the  systems-level  control  of  a  key  physiological  process.  A  not- 
insubstantial  portion  of  marine  primary  production  flows  through  this  network,  so  a  more 
in-depth  understanding  of  its  functioning  would  also  yield  biogeochemical  and 
oceanographic  insights. 

The  development  of  novel  assays  for  biogeochemically-significant  processes  that  can  be 
applied  to  environmental  samples  is  one  of  the  most  challenging  yet  exciting  prospects  of 
this  interaction.  Taking  the  tools  of  systems  biology  outside  would  enable  a  far  more 
direct  view  of  both  the  molecular  bases  of  biogeochemistry  and  of  the  full  range  of 
adaptive  strategies  employed  by  life’s  diversity  in  the  face  of  myriad  environmental 
pressures  that  cannot  be  simulated  in  the  laboratory.  The  challenges  lie  in  the  molecular 
complexity  of  natural  materials,  where  analytes  of  interest  are  often  a  small  minority  of  a 
vast  number  of  potentially  confounding  molecular  species.  Such  technological  obstacles 
are,  to  one  degree  or  another,  surmountable.  The  promise  of  a  deeper,  richer 
understanding  of  the  functioning  of  earth  system,  its  evolution  and  our  impacts  on  it,  is 
too  valuable  to  pass  up. 
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Summary 

Prochlorococcus  and  Synechococcus  are  the  two 
most  abundant  marine  cyanobacteria.  They  represent 
a  significant  fraction  of  the  total  primary  production 
of  the  world  oceans  and  comprise  a  major  fraction  of 
the  prey  biomass  available  to  phagotrophic  protists. 
Despite  relatively  rapid  growth  rates,  picocyanobac- 
terial  cell  densities  in  open-ocean  surface  waters 
remain  fairly  constant,  implying  steady  mortality  due 
to  viral  infection  and  consumption  by  predators. 
There  have  been  several  studies  on  grazing  by  spe¬ 
cific  protists  on  Prochlorococcus  and  Synechococ¬ 
cus  in  culture,  and  of  cell  loss  rates  due  to  overall 
grazing  in  the  field.  However,  the  specific  sources  of 
mortality  of  these  primary  producers  in  the  wild 
remain  unknown.  Here,  we  use  a  modification  of  the 
RNA  stable  isotope  probing  technique  (RNA-SIP), 
which  involves  adding  labelled  cells  to  natural  sea¬ 
water,  to  identify  active  predators  that  are  specifically 
consuming  Prochlorococcus  and  Synechococcus  in 
the  surface  waters  of  the  Pacific  Ocean.  Four  major 
groups  were  identified  as  having  their  18S  rRNA 
highly  labelled:  Prymnesiophyceae  (Haptophyta),  Dic- 
tyochophyceae  (Stramenopiles),  Bolidomonas  (Stra- 
menopiles)  and  Dinoflagellata  (Alveolata).  For  the  first 
three  of  these,  the  closest  relative  of  the  sequences 
identified  was  a  photosynthetic  organism,  indicating 
the  presence  of  mixotrophs  among  picocyanobacte- 
rial  predators.  We  conclude  that  the  use  of  RNA-SIP  is 
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a  useful  method  to  identity  specific  predators  for 
picocyanobacteria  in  situ,  and  that  the  method  could 
possibly  be  used  to  identify  other  bacterial  predators 
important  in  the  microbial  food-web. 


Introduction 

The  mechanisms  that  regulate  microbial  communities  are 
a  central  issue  in  ocean  ecology.  Phagotrophic  protists 
and  viruses  are  the  main  sources  of  mortality  for  these 
microbes  in  oligotrophic  environments  (Fuhrman  and 
Campbell,  1998;  Partensky  etal.,  1999a)  and  play  an 
important  role  in  shaping  microbial  communities  in  the 
ocean  (so-called  ‘top-down’  regulation)  (Sherr  and  Sherr, 
2002;  Pernthaler,  2005).  One  of  the  outstanding  questions 
is  precisely  how  the  food-web  is  structured:  which  protists 
eat  which  microbes? 

Grazing  activity  by  eukaryotes  is  a  major  factor  of  bac¬ 
terial  mortality  in  the  ocean  and  a  major  force  for  shaping 
microbial  communities  in  those  environments  (Jurgens 
and  Matz,  2002).  Heterotrophic  nanoflagellates  and 
ciliates  are  considered  to  be  the  primary  grazers  on 
planktonic  marine  bacteria  (Sherr  etal.,  1989;  Simek  and 
Chrzanowski,  1992;  Cho  etal.,  2000;  Sherr  and  Sherr, 
2002).  In  general,  grazing  by  bacterivorous  protists  upon 
suspended  bacteria  is  size  selective  (Chrzanowski  and 
Simek,  1990;  Gonzalez  etal.,  1990;  Simek  and  Chrza¬ 
nowski,  1992;  Jurgens  and  Glide,  1994;  Anderson  and 
Rivkin,  2001)  with  most  protists  grazing  preferentially  on 
medium-sized  bacterial  cells. 

Because  Prochlorococcus  and  Synechococcus  numeri¬ 
cally  dominate  the  oxygenic  phototrophs  in  ocean  waters 
(Chisholm  etal.,  1988;  Partensky  etal.,  1999a,b),  under¬ 
standing  their  sources  of  mortality  is  central  to  under¬ 
standing  the  structure  of  the  microbial  food-web,  and 
the  regulation  of  marine  productivity  and  nutrient  cycling 
in  the  ocean.  Laboratory  studies  using  cultured  het¬ 
erotrophic  flagellates  and  ciliates  have  shown  that  they 
can  survive  when  fed  Prochlorococcus  and  Synechococ¬ 
cus  (Christaki  etal.,  1999;  Guillou  etal.,  2001)  and  that 
some  feed  preferentially  on  one  or  the  other  (Christaki 
etal.,  1999;  Guillou  etal.,  2001).  Studies  using  natural 
nanoflagellate  populations  show  that  the  nanoflagellate 
community  composition  shapes  the  picoautotrophic 
community  structure  and,  vice  versa,  the  picoautotrophic 
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community  structure  favours  or  inhibits  the  growth  of 
some  nanoflagellates  groups  (Christaki  etal.,  2005). 
However,  these  studies  do  not  address  the  question  of  the 
identity  of  the  grazers  feeding  on  bacteria. 

While  rates  of  grazing-induced  mortality  of  picocyano¬ 
bacteria  have  been  measured  in  situ  (Sherr  etal.,  1987; 
Hall  etal.,  1993;  Ishii  etal.,  2002;  Massana  etal.,  2002; 
Worden  and  Binder,  2003;  An-Yi  etal.,  2007),  the  spe¬ 
cific  identity  of  the  grazers  feeding  on  these  cells  has 
not  been  studied.  In  the  present  work,  we  have  used  a 
modification  of  a  RNA  stable  isotope  probing  technique 
(RNA-SIP)  (Radajewski  etal.,  2000;  Manefield  etal., 
2002;  Lueders  etal.,  2004)  to  identify  eukaryotic  cells 
that  consume  Prochlorococcus  and  Synechococcus  in 
surface  waters  at  the  Hawaii  Ocean  Time  Series  (HOT) 
station  ALOHA.  A  similar  approach  had  been  previously 
used  to  identify  micropredators  of  Escherichia  coli  in  a 
sample  of  agricultural  soil  (Lueders  etal.,  2006).  The 
use  of  this  method  avoids  problems  associated  with 
using  non-active  bacteria  (Gonzalez  etal.,  1990;  Landry 
etal.,  1991;  del  Giorgio  etal.,  1996;  Ishii  etal.,  2002; 
Koton-Czaarnecka  and  Chrost,  2003),  and  enables 
molecular  taxonomic  resolution. 


Results  and  discussion 

Characterization  of  the  indigenous  eukaryotic  protist 
community 

We  first  characterized  the  diversity  of  protists  in  our 
sample,  collected  from  the  study  site,  Station  ALOHA 
(Hawaii  Ocean  Time  Series)  through  the  analysis  of  the 
indigenous  18S  rDNA  sequences  (Figs  1 A  and  2  and 
Fig.  SI).  The  community  was  similar  to  those  reported 
for  other  oligotrophic  surface  ocean  waters  (Countway 
etal.,  2005;  2007;  Not  etal.,  2007),  in  terms  of  first- 
and  second-rank  marine  protistan  and  Super-group  taxa 
defined  by  Adi  and  colleagues  (2005).  Alveolates,  and 
specifically  Dinozoa,  including  novel  Alveolate  groups  I 
and  II  (NAI  and  NAII),  are  among  the  most  abundant 
sequences  found.  Stramenopiles,  including  novel  Marine 
Stramenopiles  (MAST),  are  also  well  represented 
(Figs  1 A  and  2  and  Fig.  SI). 


Incubation  experiments  with  labelled  cultures 

To  determine  which  protists  from  this  community  most 
actively  grazed  on  Prochlorococcus  and  Synechococ¬ 
cus,  '^C-  and  '®N-labelled  cultures  of  these  cyanobacte¬ 
ria  were  added  to  seawater  samples  and  incubated  for 
1  day,  allowing  the  indigenous  community  to  consume 
the  labelled  cells  (see  Experimental  procedures  for 
details).  After  24  h,  the  microbial  community  was 


collected  by  filtration,  RNA  was  extracted,  and  ‘heavy’ 
(labelled)  and  ‘light’  (unlabelled)  RNA  was  separated 
by  density  gradient  ultracentrifugation.  Density-resolved 
18S  rRNA  sequences  were  amplified,  sequenced  and 
analysed.  Sequences  from  the  labelled  subtraction 
(which  are  enriched  in  a  subset  of  sequences  as  they 
are  physically  separated  from  the  bulk  community  before 
sequencing)  are  interpreted  as  being  derived  from 
eukaryotic  cells  that  consumed  high  numbers  of  labelled 
Prochlorococcus  or  Synechococcus  cells  during  the 
incubation.  Sequences  in  the  unlabelled  RNA  fraction 
represent  protists  that  did  not  graze  on  the  labelled  cells 
during  the  incubation.  Because  different  levels  of  RNA 
labelling  are  likely  to  occur  depending  on  what  fraction 
of  the  diet  of  a  particular  grazer  consists  of  Prochloro¬ 
coccus  and  Synechococcus,  we  analysed  only  the  most 
highly  labelled  fractions  (Fig.  3). 

We  recognize  that  there  are,  theoretically,  a  number  of 
possible  indirect  routes  for  the  heavy  isotopes  to  end  up  in 
the  18S  rRNA.  We  analysed  these  possibilities  in  detail  in 
a  separate  section  below,  and  conclude  that  direct  grazing 
on  Prochlorococcus  and  Synechococcus  is  the  most  con¬ 
sistent  explanation  for  the  incorporation  of  label  into  18S 
rRNA  in  our  experiments. 


Community  structure  analysis  using  terminal  restriction 
fragment  length  polymorphism  (T-RFLP) 

Before  analysing  the  sequences  of  rRNA  from  the 
labelled  and  unlabelled  fractions  in  detail,  we  assessed 
the  quality  of  the  biological  replicates  and  general  dif¬ 
ferences  and  similarities  among  the  treatments,  using 
terminal  restriction  length  polymorphism  (T-RFLP)  and 
cluster  analysis  (GEPAS,  http://www.gepas.org)  (Doll- 
hopf  etal.,  2001).  The  eukaryotic  cells  at  the  onset  of 
the  experiment  (time  0),  as  well  as  those  that  remained 
unlabelled  after  a  24  h  incubation  (i.e.  those  that  did 
not  prey  on  either  Prochlorococcus  or  Synechococcus), 
cluster  together  in  both  replicates  (Fig.  4).  The  similarity 
of  these  two  groups  indicates  that  there  were  no  signifi¬ 
cant  changes  in  the  food-web  structure  in  the  incubation 
bottles  during  the  24  h  incubation.  More  importantly,  the 
18S  rRNA  sequences  containing  the  Prochlorococcus- 
derived  label  and  Synec/rococcus-derived  label  clus¬ 
tered  separately  from  the  time  0  and  unlabelled  rRNA 
samples,  indicating  that  we  are  identifying  a  specific 
subset  of  the  community  that  is  preying  upon  these 
cyanobacteria.  Furthermore,  the  predator  sequences 
originating  from  addition  of  Prochlorococcus  and  Syn¬ 
echococcus  did  not  cluster  together,  suggesting  distinct 
predators  for  these  two  types  of  cyanobacteria,  consis¬ 
tent  with  observations  from  laboratory  studies  (Guillou 
etal.,  2001;  Pernthaler,  2005). 
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Fig.  1.  Phylogenetic  assignments  and  relative  frequencies  of  the  rDNA  sequences  from  indigenous  eukaryotic  community,  and  the  labelled 
and  unlabelled  rRNA  fractions  in  the  experimental  treatments. 

A.  rDNA  extracted  from  the  total  community. 

B.  Unlabelled  fractions  from  the  density  gradient  separations  and  time  0  samples. 

C.  Samples  with  label  originating  from  Prochlorococcus  or  Synechococcus  added  to  the  experimental  bottles. 

Error  bars  represent  the  standard  deviation  of  the  values  obtained  for  the  biological  duplicates  of  the  libraries.  Phylogenetic  assignment 
follows  Adi  and  colleagues  (2005)  with  classification  at  the  first-  (in  bold)  and  second-rank  taxonomic  level  except  when  indicated  as  follows: 
'Super-groups,  "Phylum,  '"third-rank  taxonomic  level  and  ""novel  Alveolate  groups  I  and  II  (NAI  and  NAII),  or  the  novel  MAST  following 
Not  and  colleagues  (2007). 


Fig.  2.  Unrooted  phylogenetic  tree  inferred  by  maximum  likelihood  (ML)  analysis  of  the  reference  sequences  used  in  the  phylogenetic  analysis 
of  the  clone  libraries  presented  in  this  work  (see  Supporting  information).  Selected  representative  clones  and  colour  circles  indicate  the 
phylogenetic  adscription  of  the  sequences  obtained  in  the  different  clone  libraries.  Blue  clones  and  circles:  sequences  originating  from  the 
DNA-derived  libraries.  Purple  clones  and  circles:  sequences  originating  in  the  unlabelled  fractions  from  the  density  gradient  separations  and 
time  0  samples.  Green  clones  and  circles:  sequences  originating  from  the  labelled  fraction  of  the  Prochiorococcus  inoculation  experiment.  Red 
clones  and  circles:  sequences  originating  from  the  labelled  fraction  from  the  Synechococcus  inoculation  experiment.  Partial  sequences  ranging 
from  a  minimum  of  604  bp  up  to  827  bp  were  used  in  the  alignment.  Bootstrap  values  over  50%  are  indicated  on  the  internal  branches 
obtained  from  Bootstrap  values  <  50%,  which  have  been  omitted.  The  proportion  of  invariant  sites  (!)  was  0.214.  The  scale  bar  indicates  5% 
divergence.  Classification  is  based  on  Adi  and  colleagues  (2005)  and  Not  and  colleagues  (2007).  All  groups  correspond  to  first  and  second 
rank  according  to  Adi  and  colleagues  (2005)  except  'Super-group  and  "Phylum  (Shalchian-Tabrizi  etai.,  2006),  '"third-rank  taxonomic  level 
and  ""novel  Alveolate  groups  I  and  II  (NAI  and  NAII),  or  the  novel  MAST  following  Not  and  colleagues  (2007). 
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Fig.  3.  Relative  amount  of  rRNA  in  different  fractions  separated  by  density  gradient  centrifugation  of  18S  rRNA  analysed  in  this  study.  Two 
peaks  of  RNA  were  detected  in  each  sample,  the  lighter  containing  sequences  that  did  not  incorporate  the  isotopic  label  ('®N  and  ”C)  during 
the  24  h  incubation,  and  the  heavier  to  the  RNA  greatly  enriched  in  the  heavy  isotope,  i.e.  from  cells  that  incorporated  label  from  the 
Prochlorococcus  and  Synechococcus  that  were  added  to  the  samples.  The  particular  samples  indicated  in  a  blue  circle  were  analysed  as  the 
‘unlabelled  fraction'  from  the  experimental  bottles,  and  those  in  a  red  circle  the  'heavily  labelled'  fraction.  These  particular  samples  were 
chosen  to  maximize  the  sample  size  while  at  the  same  time  avoiding  cross-contamination  of  light  and  heavy  RNA  in  the  subsequent  analyses. 
(A)  and  (B)  represent  sequences  from  biological  replicates  of  samples  amended  with  labelled  Prochlorococcus  (in  A  two  heavy  fractions  were 
used  to  increase  the  amount  of  total  RNA  used  for  constructing  the  clone  libraries)  and  (C)  and  (D)  those  amended  with  Synechococcus.  Total 
RNA  was  detected  fluorometrically  using  Ribogreen  (see  Experimental  procedures). 
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Fig.  4.  Self-organizing  tree  (SOTA)  of 
terminal  restriction  fragment  length 
polymorphism  (T-RFLP)  profiles  emerging 
from  the  1 8S  rRNA  sequences  from  the 
different  experimental  treatments.  Samples 
digested  (A)  with  Hhal  and  (B)  with  Rsal. 
Time  0:  T-RFLP  profile  of  rRNA  from  the 
entire  eukaryotic  community  at  the  beginning 
of  the  experiment.  Unlabelled  Pro  and 
Unlabelled  Syn:  T-RFLP  profiles  of  the 
unlabelled  eukaryotic  rRNA,  collected  from 
the  density  gradient,  from  the  bottles  that 
were  incubated  with  labelled  Prochlorococcus 
and  Synechococcus  respectively.  Labelled 
Pro  and  Labelled  Syn;  T-RFLP  profiles  of  the 
heavily  labelled  eukaryotic  rRNA  that  was 
collected  from  the  same  gradient.  A  and  B 
next  to  the  data  points  represent  the  two 
biological  replicates. 
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Analysis  of  the  unlabelled  and  labelled  18S 
rRNA  sequences 

We  next  analysed  the  identity  of  the  unlabelled  and 
labelled  eukaryotes  by  cloning  and  sequencing  the  18S 
rRNA  fragments  from  the  heavily  labelled  and  unlabelled 
fractions  isolated  from  the  density  gradient  separation 
(Fig.  3).  Heavily  labelled  fractions  represent  eukaryotes 
that  have  eaten  either  Prochlorococcus  or  Synechococ- 
cus.  The  unlabelled  fractions  represent  eukaryotes  in  the 
community  with  relatively  high  levels  of  rRNA  that  did  not 
assimilate  label  from  the  cyanobacteria.  As  has  been 
reported  previously  (Stoeck  et  a!.,  2007)  the  sequences  in 
the  unlabelled  rRNA-derived  library  are  substantially  dif¬ 
ferent  from  those  in  the  rDNA  library  (Figs  1 A  and  B  and  2; 
Figs  SI  and  S2),  showing  that  there  are  some  members 
of  the  community  that  are  much  more  ‘active’  (as  mea¬ 
sured  by  rRNA  levels)  than  others. 

Most  sequences  obtained  in  the  rRNA-derived  library 
from  the  time  0  samples  and  unlabelled  fractions  repre¬ 
sented  members  of  the  Chlorophyta,  principally  close 
relatives  of  the  genus  Picochlorum  (Figs  1 B  and  2  and 
Fig.  S2).  Other  taxonomic  groups  identified  in  these  librar¬ 
ies  included  the  Dictyochophyceae  (Stramenopiles)  and 
Prymnesiophyceae  (Haptophyta),  and  in  smaller  numbers 
relatives  of  members  of  Raphidophyceae,  Bolidomonas, 
Bacillariophyta,  Pelagophyceae  (all  Stramenopiles), 
Euglenozoa  and  Dinozoa  (Alveolota)  (Figs  1 B  and  2  and 
Fig.  S2). 

The  sequences  that  appeared  in  the  labelled  fractions 
(Figs  1C  and  2,  Figs  S3  and  S4)  -  i.e.  from  cells  grazing 
on  Prochlorococcus  and  Synechococcus  -  belonged 
primarily  to  four  groups:  the  Prymnesiophyceae,  Dic¬ 
tyochophyceae,  Bolidomonas  and  Dinoflagellata. 
Dictyochophyceae  dominated  the  18S  rRNA  sequences 
that  had  incorporated  label  from  Prochlorococcus,  while 
Bolidomonas  dominated  those  that  had  incorporated  the 
label  from  Synechococcus  (Fig.  1C),  but  it  appears  that 
the  four  dominant  grazers  consume  both  types  of  cells. 
Novel  MAST  also  appeared  in  both  labelled  rRNA-derived 
libraries  and  they  have  been  identified  as  non-pigmented 
heterotrophic  flagellates  with  bacterivory  activity 
(Massana  eta!.,  2002).  Some  taxonomic  groups  appear 
to  be  specific  to  either  Prochlorococcus  or  Synechococ¬ 
cus  (Figs  1C  and  2,  Figs  S3  and  S4),  but  this  could  be 
simply  due  to  the  small  library  sample  size.  Certain  groups 
that  were  present  in  the  labelled  rRNA-derived  clone 
libraries  but  were  absent  in  the  unlabelled  rRNA-derived 
clone  libraries  could  have  been  simply  masked  by  the 
high  dominance  of  Chlorophyta  in  the  rRNA-derived  clone 
libraries  in  relation  to  the  rest  of  identified  phylogenetic 
groups. 

Ciliates,  which  are  considered  important  grazers  in 
some  aquatic  environments  (Sherr  and  Sherr,  2002;  Pern- 


thaler,  2005),  represent  a  small  fraction  of  the  labelled 
sequences,  which  is  consistent  with  recent  work  showing 
that  subtropical  marine  ciliates  exhibit  almost  no  grazing 
activity  on  bacterium-sized  particles  (An-Yi  etal.,  2007) 
and  with  the  experimental  results  by  Christaki  and 
colleagues  (1999)  showing  that  Prochlorococcus  and 
Synechococcus  proved  to  be  poor  food  sources  for 
ciliate  growth. 

The  most  striking  observation  in  these  results  is  that 
three  of  the  four  most  abundant  sequences  in  the  labelled 
18S  rRNA  fraction  belong  to  the  taxa  Prymnesiophyceae, 
Dictyochophyceae  and  Bolidomonas,  whose  character¬ 
ized  members  are  photosynthetic.  Two  groups  present  in 
the  labelled  18S  rRNA  fraction,  the  Pelagophyceae  and 
Bolidomonas,  have  not  previously  been  found  to  consume 
bacteria.  Some  Pelagophyceae  feed  heterotrophically 
on  dissolved  organic  matter  (Lomas  etal.,  2001)  but 
this  group  has  previously  been  described  as  non- 
phagotrophic  (Cavalier-Smith  and  Chao,  2006).  Charac¬ 
terized  members  of  the  Bolidomonas,  the  most  frequently 
detected  labelled  group  in  the  Synechococcus-fed 
samples  (Figs  1C  and  2  and  Fig.  S4),  are  all  photosyn¬ 
thetic.  While  some  members  of  the  Dictyochophyceae, 
which  dominate  the  clone  libraries  from  the  labelled 
Prochlorococcus-ted  samples  (Figs  1C  and  2  and 
Fig.  S3),  are  heterotrophs,  the  closest  relative  to  the 
sequences  we  have  identified  is  Florenciella  parvula, 
which  is  photosynthetic.  Also  among  the  identified  preda¬ 
tors  of  Proohlorococcus  and  Synechocoocus  are  repre¬ 
sentatives  of  groups  known  to  be  capable  of  mixotrophy, 
including  the  Chrysophyceae  (Nygaard  and  Tobiesen, 
1993),  Prymnesiophyceae  (Nygaard  and  Tobiesen,  1993; 
Hansen  and  Hjorth,  2002)  and  Dinoflagellata  (Hansen 
and  Nielsen,  1997).  Almost  all  of  the  sequences  from  the 
labelled  clone  libraries  belong  to  plastid-containing  lin¬ 
eages;  only  sequences  identified  as  relatives  of  Telonema 
(phylum  Telonemia)  (Shalchian-Tabrizi  etal.,  2006)  and 
Centrohelida  come  from  groups  not  known  to  contain 
autotrophic  members. 

Previous  work  had  already  presented  evidence  that 
mixotrophic  nanoflagellates  are  important  predators  in 
surface  waters  and  may  make  up  more  than  50%  of  the 
bacterivory  in  them,  and  that  they  are  more  abundant  near 
ocean  surface  waters  than  in  the  deeper  euphotic  zone 
(Arenovski  et  al.,  1995;  Caron,  2000).  Moreover,  previous 
studies  have  demonstrated  that  pigmented  and  non- 
pigmented  nanoflagellates  had  similar  grazing  rates  on 
heterotrophic  bacteria  (Hall  etal.,  1993). 

Deteotion  of  label  in  plastid  1 6S  rRNA 

To  further  test  the  mixotrophy  hypothesis  we  examined 
whether  the  labelled  fraction  contained  plastid  DNA  using 
primers  designed  specifically  for  the  16S  rRNA  sequence 
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in  chloroplast  DNA  (Table  SI).  We  designed  these 
primers  specifically  to  amplify  plastid  16S  rRNA  genes, 
but  not  Prochlorococcus  and  Synechococcus  1 6S  rRNA, 
since  the  latter  would  have  dominated  our  signal.  This 
meant  we  did  not  recover  as  many  plastid  sequences  as 
we  might  have  if  we  had  used  published  plastid  1 6S  rRNA 
primers  (Fuller  et  al.,  2006),  but  this  was  an  unavoidable 
limitation,  given  the  experimental  design. 

Two  of  the  primer  sets  for  plastid  16S  rRNA  (primers 
sets  6  and  15,  Table  SI)  yielded  PCR  products  of  the 
expected  size,  and  in  the  case  of  primer  set  15  the 
product  was  long  enough  (approximately  650  bp)  to  be 
sequenced  and  analysed.  Although  it  is  difficult  to  deter¬ 
mine  the  exact  affiliation  of  these  chloroplast  sequences 
given  the  short  length  of  the  amplified  PCR  product,  and 
the  limited  coverage  of  chloroplast  sequences  from  differ¬ 
ent  plastid-containing  phylogenetic  groups  in  the  data¬ 
base,  the  phylogenetic  analysis  showed  that  the  amplified 
sequences  from  the  labelled  fraction  were  indeed  from 
chloroplasts  (Fig.  S5).  Furthermore,  the  phylogenetic 
analysis  showed  that  the  closest  relatives  of  the  chloro¬ 
plasts  identified  in  our  labelled  fraction  were  related  to 
Bolidomonas  mediterranea  and  diatom  chloroplasts 
(Fig.  S5).  As  there  are  a  limited  number  of  chloroplast 
sequences  representing  other  groups  of  Stramenopiles  in 
the  databases,  and  given  the  short  size  of  the  analysed 
product,  the  exact  phylogenetic  affiliation  of  these 
sequences  is  not  entirely  clear.  The  key  finding,  however, 
is  that  all  of  the  sequences  obtained  cluster  with  chloro¬ 
plasts  indicating  that  the  heavy  label  ended  up  in  eukary¬ 
otic  cells  capable  of  photosynthesis. 

Analysis  of  alternative  routes  for  label  incorporation 

In  this  and  other  types  of  labelling  experiments  with 
natural  populations,  the  possibility  that  the  isotopic  label 
might  have  been  acquired  by  protists  via  a  route  other 
than  phagotrophic  predation  must  be  considered.  For 
example,  it  is  conceivable  that  the  label  might  have 
passed  through  a  dissolved  phase,  either  organic  or  inor¬ 
ganic,  and  was  acquired  through  non-phagotrophic  nutri¬ 
ent  uptake.  Alternatively,  the  label  could  have  been  initially 
acquired  by  bacterial  heterotrophs  that  were  subse¬ 
quently  grazed  by  phagotrophs.  Below  we  consider  each 
of  these  possibilities  in  turn,  and  present  evidence 
that  they  do  not  appear  to  be  playing  a  role  in  these 
experiments. 

The  labelled  cyanobacterial  biomass  could  have  been 
transformed  to  dissolved  inorganic  carbon  (DIG)  through 
respiration,  either  by  the  picocyanobacteria  themselves  or 
by  other  heterotrophs.  Fiad  a  substantial  amount  of  the 
added  biomass  been  respired,  that  labelled  carbon  would 
have  become  broadly  available  for  fixation  by  all  of  the 
autotrophs  in  the  sample,  which  would  then  appear  in  the 


labelled  fraction.  In  fact,  the  most  abundant  sequences  in 
the  unlabelled  rRNA-derived  clone  libraries  -  the  photo- 
autotrophic  Chlorophyta  -  were  not  represented  in  the 
labelled  fraction  (Figs  1 B  and  C  and  2,  Figs  S2-S4).  This 
demonstrates  that  no  significant  quantity  of  labelled  DIG 
was  available  for  photosynthetic  fixation,  and  passage  of 
the  label  through  the  dissolved  carbonate  pool  can  be 
excluded. 

Another  possibility  would  be  that  the  initially  supplied, 
isotopically  labelled  biomass  might  have  entered  the  dis¬ 
solved  organic  carbon  (DOG)  pool  by  exudation,  lysis  or 
‘sloppy  feeding’  by  zooplankton.  The  latter  two  mecha¬ 
nisms  would  result  in  substantial  declines  in  the  picocy- 
anobacterial  population  during  the  experiment;  however, 
the  concentration  of  picocyanobacteria  did  not  change 
dramatically  over  the  24  h  of  incubation.  In  all  cases  the 
initial  and  final  concentration,  after  24  h  of  incubation,  of 
both  Prochlorococcus  and  Synechococcus  was  of  lO'^ 
cells  ml  \  suggesting  that  mechanisms  involving  cell 
death  (including  lysis  and  sloppy  feeding)  did  not  release 
large  amounts  of  biomass  into  the  dissolved  phase.  To 
consider  exudation,  we  can  use  the  Prochlorococcus 
addition  experiment  as  an  example.  Prochlorococcus 
MED4  cells  were  added  to  the  seawater  sample  at  a 
concentration  of  1.7  ¥10*^  per  ml  and  typically  contain 
about  60  fg  of  carbon  per  cell  (Bertilsson  etal.,  2003).  If 
we  imagine  that  the  added  Prochlorococcus  could 
somehow  exude  all  of  their  initial  labelled  carbon  as  DOG 
-  while  suffering  no  great  decline  in  cell  numbers  -  this  is 
equivalent  to  the  addition  of  0.9  mM  of  '^G-DOG,  clearly  an 
upper  limit  for  the  potential  contribution  of  the  isotopically 
labelled  Prochlorococcus  to  the  DOG  pool.  Typical 
surface  total  DOG  concentrations  at  station  ALOFIA, 
where  the  samples  for  this  study  were  taken,  are  around 
75  mM,  of  which  40  mM  is  likely  refractory  organic  matter 
that  turns  over  very  slowly  (Garlson,  2002).  Flence  there  is 
roughly  35  mM  of  labile  DOG  available  for  rapid  het- 
erotrophic  consumption.  Addition  of  Prochlorococcus- 
derived  ^^G-DOG  to  this  could  result  in  a  36  mM  pool  of 
labile  DOG  with  maximum  '®G  content  of  3.5  atom%, 
which  is  in  turn  the  upper  limit  for  labelling  by  DOG  con¬ 
sumption.  Similar  considerations  limit  the  ^^G  content  of 
DOG  in  the  Synechococcus  addition  experiments  to  10.6 
atom%. 

Next,  we  consider  the  extent  of  labelling  of  the  heavy 
RNA  fractions  in  our  incubation  experiments.  The  differ¬ 
ence  in  buoyant  density  between  heavy  and  light  RNA 
fractions  in  these  experiments  ranged  from  0.034  to 
0.078  g  ml  '  (Fig.  3),  equal  to  or  exceeding  the  buoyancy 
differences  (0.035-0.04  g  ml ')  observed  by  Lueders  and 
colleagues  (2004)  for  100%  '^G-labelled  SSU  rRNA.  This 
large  difference  in  buoyant  density  suggests  that  the 
heavy  fractions  analysed  in  this  experiment  were  highly 
labelled,  likely  in  excess  of  90  atom%  '^G.  This  is  far 
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greater  than  the  3-11%  possible  from  DOC  consumption, 
even  under  the  assumption  of  maximally  rapid  exudation 
by  the  added  cyanobacteria.  The  buoyancy  differences 
observed  here  in  excess  of  the  -0.4  g  ml  ’  reported  by 
Lueders  and  colleagues  (2004)  may  reflect  ’®N  incorpo¬ 
ration  and/or  differences  in  centrifugation  conditions.  In 
any  event,  the  heavy  RNA  in  these  experiments  is  much 
too  highly  labelled  to  derive  from  heterotrophic  consump¬ 
tion  of  DOC. 

A  third,  even  more  mechanistically  complicated  possi¬ 
bility  is  the  direct  and  specific  consumption  of  picocyano¬ 
bacteria  by  heterotrophic  bacteria  or  the  consumption  of 
labelled  DOC  exuded  by,  or  otherwise  released  from, 
the  picocyanobacteria  by  those  heterotrophs.  Protistan 
predators  can  then  graze  on  these  labelled  heterotrophs. 
If  this  occurred,  the  1 8S  sequences  observed  in  the  heavy 
fraction  would  reflect  grazing  activity,  though  not  specifi¬ 
cally  on  Prochiorococcus  or  Synechococcus.  Under 
this  scenario,  a  subset  of  heterotrophic  bacteria  would 
become  highly  labelled,  and  their  RNA  should  be  found  in 
the  heavy  fraction.  To  address  this  possibility,  we  con¬ 
structed  16S  rRNA  clone  libraries  as  described  in  Experi¬ 
mental  procedures.  If  there  had  been  transfer  of  labelled 
organic  matter  through  heterotrophic  bacteria  at  the  level 
needed  to  fractionate  differentially  in  a  CsTFA  gradient  we 
would  expect  to  find  16S  rRNA  sequences  from  het¬ 
erotrophic  bacteria.  Forty-three  clones  from  the  labelled 
fractions  were  sequenced.  Seventeen  clones  came  from 
the  fraction  obtained  from  the  bottles  inoculated  with 
labelled  Prochiorococcus  MED4  and  in  all  cases  the 
best  BLASTN  match  for  those  sequences  corresponded 
to  Prochiorococcus  marinus.  Similarly,  26  clones  coming 
from  the  fraction  obtained  from  the  bottles  inoculated  with 
labelled  Synechococcus  WFI8102  and  in  all  cases  the 
best  BLASTN  match  corresponded  to  Synechococcus. 
Additionally,  11  clones  coming  from  the  unlabelled  fraction 
from  the  Prochiorococcus  experiment  were  sequenced 
and  18%  of  those  corresponded  to  P.  marinus,  while  the 
rest  were  sequences  from  heterotrophic  bacteria.  These 
results  demonstrate  that  16S  rRNA  compositions  of  the 
labelled  and  unlabelled  fractions  were  indeed  distinct,  and 
that  heterotrophic  bacteria  did  not  appear  to  become 
highly  labelled  over  the  course  of  the  incubation.  We  thus 
conclude  that  the  labelled  eukaryotes  did  not  obtain  their 
label  indirectly  via  predation  of  heterotrophic  bacteria. 

Conclusions  and  implications 

The  reproducibility  and  internal  consistency  of  the 
results  obtained  in  the  study  indicate  that  the  use  of 
RNA-SIP  for  studying  the  marine  microbial  food-webs  in 
situ  has  tremendous  potential.  There  are  a  multitude  of 
variations  on  this  experimental  design  that  could  yield 
many  insights  into  the  specific  pathways  of  the  flow  of 


carbon  and  energy  in  the  marine  food-web.  These  par¬ 
ticular  results  also  reveal  that  a  significant  fraction  of 
the  eukaryotes  that  we  identified  as  grazing  specifically 
on  Prochiorococcus  and  Synechococcus  were  likely 
mixotrophs  -  i.e.  cells  that  utilize  both  phototrophy  and 
phagotrophic  heterotrophy  as  a  way  of  obtaining  nutri¬ 
ents  and  energy  (Raven,  1997;  Jones,  2000).  While  a 
few  studies  have  provided  evidence  of  the  importance  of 
mixotrophy  in  marine  aquatic  environments  (Arenovski 
etal.,  1995;  An-Yi  etal.,  2007;  Unrein  eta!.,  2007),  this 
is  the  first  study  to  identify  marine  mixotrophs  through 
their  grazing  activity  on  specific  prey. 

The  adoption  of  mixotrophy  as  a  survival  strategy  under 
oligotrophic  oceanic  conditions  might  confer  a  fitness 
advantage  for  a  number  of  reasons  (Raven,  1997).  First, 
phagotrophy  may  be  a  way  for  relatively  large  eukaryotic 
cells  to  acquire  inorganic  nutrients  such  as  N,  P  and  Fe 
in  oligotrophic  waters.  Arenovski  and  colleagues  (1995) 
presented  experimental  evidence  of  a  decrease  in  the 
abundance  of  mixotrophic  phototrophs  under  nutrient 
enrichment  conditions,  suggesting  that  phagotrophy  is 
used  under  low  dissolved  nutrient  concentrations,  condi¬ 
tions  that  are  normal  in  surface  oligotrophic  water.  With 
their  larger  surface  to  volume  ratio,  picocyanobacteria  like 
Prochiorococcus  and  Synechococcus  likely  have  an 
advantage  over  larger  eukaryotic  cells  in  acquiring  dis¬ 
solved  nutrients.  Consuming  cyanobacteria  may  also  be  a 
way  for  the  larger  cells  to  increase  their  relative  fitness  by 
reducing  the  abundance  of  their  competitors  for  nutrients. 
Mixotrophy  has  been  linked  to  survival  of  nanoflagellates 
under  nutrient  limitation  (Unrein  etal.,  2007)  and  it  has 
been  shown  that  algal  flagellates  increase  bacterivory 
under  phosphate  limitation  (Nygaard  and  Tobiesen, 
1993).  Moreover,  the  metabolic  costs  of  adding  phag¬ 
otrophic  machinery  to  an  otherwise  photosynthetic 
metabolism  may  be  rather  low  in  comparison  with  the 
potential  benefits  (Raven,  1997). 

Predation  by  mixotrophs  also  has  implications  for  our 
understanding  of  the  population  dynamics  of  marine  pico¬ 
cyanobacteria.  While  picocyanobacteria  are  generally  the 
numerically  dominant  phytoplankton  in  stratified  olig¬ 
otrophic  open-ocean  waters,  they  usually  do  not  bloom 
(i.e.  increase  markedly  in  cell  concentrations)  in  response 
to  episodic  nutrient  supplies  (Mann  and  Chisholm,  2000). 
This  behaviour  has  been  explained  by  concomitant 
increases  in  grazing  rates,  implying  that  these  grazers  are 
able  to  respond  very  quickly  to  shifts  in  prey  growth  and 
quality.  Our  identification  of  mixotrophic  predators  may 
shed  further  light  on  this  dynamic:  eukaryotic  mixotrophs 
directly  exploit  the  same  episodic  supplies  of  dissolved 
nutrients  as  their  picocyanobacterial  prey,  and  thus  could 
grow  faster,  through  stimulated  autotrophy,  as  nutrients 
become  more  abundant.  As  their  populations  grow  and 
consume  the  available  nutrients,  they  may  shift  towards 
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phagotrophy,  increasing  the  mortality  rate  of  cyanobacte¬ 
ria,  preventing  bloom  formation  even  in  the  face  of  rapid 
growth  rates.  This  hypothesis  is  directly  testable  using  the 
approach  we  have  described. 

As  evidence  increasingly  points  towards  the  mix- 
otrophic  capabilities  of  both  nominally  photo-  and  het- 
erotrophic  organisms  it  is  becoming  clear  that  a  sharp 
distinction  between  photosynthetic  and  predatory  lif¬ 
estyles  is  a  false  dichotomy.  It  is  likely  that  marine  protists 
utilize  a  spectrum  of  trophic  strategies,  ranging  between 
obligate  photoautotrophic  and  strictly  phagotrophic  end 
members  and  occupying  nearly  all  gradations  in  between 
(Sanchez-Puerta  etal.,  2007).  Further  investigations 
regarding  other  ocean  sites  and  different  depths  are 
needed  to  confirm  the  potential  importance  of  mixotrophy 
as  a  common  metabolic  strategy  for  grazes  feeding  on 
picocyanobacteria. 

Experimental  procedures 

Sampling  and  incubation  conditions 

Prochlorococcus  MED4  and  Synechococcus  WH8f02  were 
grown  for  4  days  at  19  TO  under  continuous  cool  white  light 
(16.6  mmol  Q  m'^  s'')  In  artificial  seawater  medium  (Rippka 
et  al.,  2000)  amended  with  6  mM  '^C-sodium  bicarbonate  and 
800  mM  '®N-ammonium  chloride.  Cells  were  harvested  by 
centrifugation  at  8000  g  for  15  min  and  washed  twice  in  unla¬ 
belled  artificial  seawater  medium  and  re-suspended  in  the 
same  medium.  Cells  were  counted  by  flow  cytometry  to  have 
an  estimate  of  the  volume  of  inoculum  to  be  used  in  the 
experiment,  in  order  to  have  a  final  concentration  of  picocy¬ 
anobacteria  similar  to  the  concentration  found  in  natural 
samples  (approximately  10®  cells  mh').  Final  Isotopic  enrich¬ 
ment  of  the  cultures  was  measured  by  mass  spectrometry  at 
UC  Davis  Stable  Isotope  Facility  using  on-line  combustion 
(Europe  Integra):  atom%  ’®C  for  Prochlorococcus  MED4  was 
98.86%  and  for  Synechococcus  WH8102  84.20%  and 
atom%  '®N  for  Prochlorococcus  MED4  was  61.13%  and  for 
Synechococcus  WH8102  39.83%.  These  cultures  were  then 
transported  overnight  in  the  dark  to  the  field  site  for  use  in  the 
grazing  experiments. 

Samples  of  ocean  surface  water  (3-5  m  depth)  were  col¬ 
lected  in  500  ml  acid  cleaned  bottles  during  the  month  of 
March  2006  as  a  part  of  HOT  cruise  179,  and  inoculated  with 
either  labelled  Prochlorococcus  MED4  or  Synechococcus 
WH8102  at  a  final  concentration  of  10®  cells  mh'.  All  ship¬ 
board  incubations  were  performed  in  duplicate  and  analysed 
independently.  The  incubations  were  set  in  an  on-deck  incu¬ 
bator,  which  was  constantly  re-circulated  with  surface  seawa¬ 
ter  to  maintain  temperature.  Two  samples  of  200  ml  were 
collected  at  the  beginning  of  the  experiment  as  a  control  to 
identify  the  initial  eukaryotic  community.  Samples  of  250  ml 
were  collected  from  the  bottles  with  added  labelled  Prochlo¬ 
rococcus  and  Synechococcus  after  24  h  of  incubation.  The 
24  h  period  allowed  enough  time  for  the  labelled  isotopes  to 
be  incorporated  into  the  nucleic  acids  of  the  grazers  yet 
prevented  both  significant  changes  in  the  eukaryotic  commu¬ 
nity,  and  potential  indirect  incorporation  of  labelled  isotopes 

Journal  compilation  ©  2008  Society  for  Applied  Microbiology  and 


that  could  occur  during  an  extended  incubation.  All  water 
samples  were  filtered  through  0.2-mm-pore-size  membranes 
and  preserved  in  RNAIater  at  -80°C  until  analysis. 

DNA  and  RNA  extraction,  gradient  fractionation 
and  cDNA  synthesis 

RNAIater  was  removed  by  washing  the  filters  with  cold  70% 
ethanol.  DNA  was  extracted  following  Coffroth  and  col¬ 
leagues  (1992)  protocol.  Filters  were  placed  in  0.5  ml  of 
CTAB  (hexadecyltrimethyl  ammonium  bromide)  buffer  (1.4  M 
NaCI,  20  mM  EDTA,  100  mM  Tris-HCI  pH  8.0,  0.2%  CTAB 
and  0.2%  2-mercapthoethanol)  and  the  tubes  were  placed  in 
a  mini-bead  beater  (BioSpec  Products,  Bartlesville,  OK, 
USA)  and  vortexed  for  2  min  at  the  maximum  speed 
(4800  r.p.m.)  to  re-suspend  the  cells.  Proteinase  K  was 
added  to  a  final  concentration  of  0.1  mg  ml '  and  samples 
were  incubated  at  65 °C  for  1  h.  An  equal  volume  of  chloro¬ 
form  was  added,  mixed  and  spun  at  14  000  g  for  10  min.  The 
aqueous  layer  was  transferred  to  a  new  tube  and  DNA  was 
extracted  with  an  equal  volume  of  phenol  :  chloroform  : 
isoamyl  alcohol  (25:24:1).  Finally,  DNA  was  precipitated  by 
addition  of  2  vols  of  cold  95%  ethanol  without  addition  of 
additional  salt.  Pellet  was  washed  twice  with  70%  cold 
ethanol  dried  and  re-suspended  in  water. 

For  RNA  extraction  filters  were  placed  in  1 00  ml  of  1 0  mM 
Tris-HCI  pH  8.0,  4  ml  of  RNase  inhibitor  (Ambion,  Austin,  TX, 
USA)  and  2  ml  lysozyme  (50  mg  mb').  Samples  were  incu¬ 
bated  for  30  min  at  37  TJ.  An  additional  2  ml  of  the  50  mg  mb' 
lysozyme  solution  was  added  and  the  samples  were  incu¬ 
bated  again  for  30  min  at  37°C.  Total  RNA  was  immediately 
extracted  by  a  mirVana  RNA  isolation  kit  (Ambion,  Austin,  TX, 
USA). 

Labelled  and  unlabelled  RNA  were  separated  by  density 
gradient  centrifugation,  performed  according  to  the  protocol 
of  Lueders  and  colleagues  (2004).  Centrifugation  media  were 
prepared  by  mixing  4.5  ml  of  a  2  g  mb'  CsTFA  stock  solution 
(Amersham  Pharmacia  Biotech),  up  to  1  ml  of  gradient  buffer 
(GB;  0.1  M  Tris-HCI  pH  8;  0.1  M  KCI;  1  mM  EDTA)  and  RNA 
extracts  (up  to  500  ng).  Additionally,  1 75  ml  of  formamide  was 
added  to  centrifugation  media  to  guarantee  that  RNA  was 
denatured.  The  average  density  of  all  prepared  gradients 
was  checked  with  an  AR200  digital  refractometer  (Leica 
Microsystems),  and  adjusted  by  adding  small  volumes  of  Cs 
salt  solution  or  gradient  buffer,  if  necessary.  18S  rRNA  was 
resolved  in  CsTFA  gradients  with  an  average  density  of 
1.8316  g  mb'  at  20°C.  Quick-Seal  Polyallomer  tubes,  3.9  ml 
(Beckmann  Instruments),  were  filled  up  with  centrifugation 
media  plus  sample,  and  centrifuged  in  an  Optima  TLX 
ultracentrifuge  using  a  TLN100  vertical  rotor  (Beckmann 
Instruments).  Centrifugation  conditions  were  >  60  h  at 
61  000  r.p.m.  (131  000  g). 

Centrifuged  gradients  were  fractionated  from  bottom  to  top 
into  12  equal  fractions  (-400  ml).  A  precisely  controlled  flow 
rate  was  achieved  by  displacing  the  gradient  medium  with 
water  at  the  top  of  the  tube  using  a  syringe  pump  (Harvard 
Apparatus).  The  density  of  15  ml  from  each  collected  fraction 
was  determined  using  an  AR200  digital  refractometer  (Leica 
Microsystems).  Total  RNA  was  precipitated  with  1  vol.  of  iso¬ 
propanol.  Precipitates  from  gradient  fractions  were  washed 
once  with  70%  ethanol  and  re-suspended  in  25  ml  of  EB  for 
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subsequent  determination  of  totai  RNA  using  RiboGreen 
(Molecuiar  Probes,  invitrogen,  Carisbad,  CA,  USA)  assays. 

Primers  for  18S  rRNA  eukaryotic  genes  were  designed 
using  the  Design  Probes  tooi  from  the  ARB  software  (Ludwig 
etal.,  2004):  EukF  (5'-GGGTTCGATTCCGGAGAG-3')  EukR 
(5'-CCGTGTTGAGTCAAATT-3')  (Integrated  DNA  Technolo¬ 
gies  Coralville,  lA,  USA).  The  database  used  contained 
27  887  complete  sequences,  all  of  eukaryotic  origin.  EukF 
primer  matched  19  378  sequences  with  0  mismatches  and 
23  459  sequences  with  one  mismatch.  EukR  primer  matched 
25  739  sequences  with  0  mismatches  and  27  447  sequences 
with  one  mismatch.  They  were  tested  in  two  cultures  of 
Cafeteria,  two  cultures  of  Paraphysomonas  and  one  culture 
of  Dullaniella,  given  in  all  cases  the  expected-size  PCR 
product  of  approximately  830  bp. 

Total  RNA  (0.5-5  ng)  from  fractions  containing  highly 
labelled  and  unlabelled  RNA  was  reverse  transcribed  with  the 
specific  primers  using  the  ThermoScript  RT-PCR  system 
(Invitrogen,  Carlsbad,  CA,  USA).  Reverse  transcription  was 
performed  for  2  h  at  50  "C. 

PCR  reactions  were  performed  using  Taq  DNA  polymerase 
from  NEB  and  primers  at  2  mM  concentration.  After  5  min 
at  95°C,  35  cycles  of  denaturation  (95‘C,  45  s),  annealing 
(52°C,  1  min),  elongation  (72°C,  1  min)  and  a  final  elongation 
step  (72°C,  10  min)  were  run  in  a  MJ  Research  PTC  100 
Thermal  Cycler.  PCR  products  were  cleaned  up  using  a 
QIAquick  PCR  purification  kit  (Qiagen,  Valencia,  CA,  USA) 
and  cloned  into  either  TOPO  TA  cloning  vector  (Invitrogen, 
Carlsbad,  CA,  USA)  or  pGEM-T  cloning  vector  (Promega, 
Madison,  Wl,  USA).  Inserts  were  sequenced  either  at 
Genaissance  Pharmaceuticals  (New  Haven,  CT;  now  Cogen- 
ics,  MA,  USA)  using  primers  for  the  T7  promoter  region  or  in 
house  using  the  same  primer  and  the  BigDye  sequencing  kit 
(Applied  Biosystems,  Foster  City,  CA,  USA)  at  1  min  dena¬ 
turation  and  25  cycles  of  95°C-30  s,  50°C-20  s,  60°C-4  min, 
and  finally  held  at  4°C.  The  reactions  were  then  purified  by 
ethanol  precipitation  and  run  on  an  ABI  PRISM  3730  (Applied 
Biosystems)  capillary  DNA  sequencer. 

1 6S  rRNA  genes  from  bacteria  present  in  the  heavy  frac¬ 
tions  were  cloned  and  sequenced  using  universal  primers  9F 
(5'-GAGTTTGATYMTGGCTC)  and  1509R  (5'-GYTACCTT 
GTTACGACTT)  (Integrated  DNA  Technologies  Coralville,  lA, 
USA).  PCR  and  cloning  were  performed  as  described  above 
but  elongation  at  72‘C  was  extended  to  2  min.  Fragments 
were  sequenced  using  the  ABI  PRISM  BigDyeTerminatorv3.1 
Cycle  Sequencing  Kit  (Applied  Biosystems,  Warrington,  UK) 
and  primers  for  the  T7  promoter  region. 

Taxonomic  affiliation  and  phylogenetic  analysis 

Vector  contamination  was  assessed  using  VecScreen  (http:// 
www.ncbi.nlm.nih.gov/VeoScreen/VeoScreen.html).  On  the 
basis  of  the  evaluation  by  the  check_chimera  program  of  the 
Ribosomal  Database  Project  (Maidak  etal.,  2001)  only 
sequences  that  showed  no  evidence  for  potential  chimeric 
gene  artefacts  were  analysed. 

Preliminary  taxonomic  affiliation  of  the  sequences  was 
determined  using  blastn  against  the  GenBank  nr  database 
(March  2005).  Phylogenetic  analysis  was  based  on  partial 
sequences  trimmed  to  the  shortest  common  denominator. 
A  first  analysis  to  confirm  the  taxonomic  affiliation  of  the 


sequences,  and  have  a  raw  picture  of  the  overall  phyloge¬ 
netic  tree,  was  performed  using  ARB  software.  Sequences 
were  aligned  against  the  eukaryotic  database  (SSRef  release 
90  12.05.2007,  SILVA  database  project  http://www.arb- 
silva.de/  with  27  887  pre-aligned  sequences)  (Pruesse  etal., 
2007)  in  the  ARB  software  version  07.02.20  (Ludwig  etal., 

2004)  and  performed  using  the  Fast  Alignment  tool.  Align¬ 
ments  were  edited  manually  and  sequences  were  added  to 
the  backbone  tree  using  ARB's  ‘Parsimony  insertion’  feature. 

For  maximum  likelihood  (ML),  neighbour  joining  (NJ)- 
distance  and  maximum  parsimony  (MP)  analyses,  align¬ 
ments  were  generated  using  MAFFT  (Katoh  etal.,  2002; 

2005)  and  edited  manually  using  Sequence  Alignment 
Editor  v2.0  (http://tree.bio.ed.ac.uk/software/seal/).  Maximum 
parsimony  analysis  was  performed  using  the  ‘fast’  stepwise- 
addition  algorithm  in  paup  4.0b10  (Altivec)  with  1000  boot¬ 
straps  replicates.  For  each  alignment  the  best  DNA 
substitution  model  was  evaluated  using  MrModeltest  2.2 
(Nylander,  2004),  which  ranked  General  Time  Reversible- 
gamma-Proportion  invariant  (GTR-rg+l)  best  model  in  all 
cases.  Maximum  likelihood  analysis  was  performed  using  the 
software  PHYML_v2.4.4  (Guindon  and  Gascuel,  2003)  and 
GTR  as  a  substitution  model  with  1 00  bootstraps  replicates. 
Neighbour  joining-distance  analysis  was  performed  using 
PAUP  4.0b10  (Altivec)  using  the  also  GTR  as  a  substitution 
model,  with  1000  bootstraps  replicates,  and  the  values  of 
Gamma-shape  and  proportion  of  invariable  sites  estimated 
by  PHYML.  Trees  were  visualized  and  plotted  using  NJPIot 
v2.1  (Perriere  and  Gouy,  1996). 


T-RFLP  analysis 

Fluorescently  labelled  PCR  products  for  the  T-RFLP  analysis 
were  generated  by  the  PCR  protocol  described  above,  using  a 
FAM-labelled  forward  primer.  PCR  products  were  digested 
with  the  restriction  endonucleases  Hhal  and  Rsal  (New 
England  Biolabs,  Ipswich,  MA,  USA).  The  resulting  fluorescent 
terminal  fragments  were  resolved  and  analysed  at  the  Roy  J. 
Carver  Biotechnology  Center  (University  of  Illinois  at  Urbana- 
Champaign)  using  an  ABI  Prism  3730x1  Analyser  automated 
sequencer,  and  GeneMapper  version  3.7  software. 

Clustering  of  the  different  T-RFLP  profiles  was  performed 
using  the  Self-Organizing  Tree  Algorithm  (SOTA)  from  the 
GEPAS  4.0  (GEPAS  website  http://www.gepas.org). 


Chloroplast  16S  rRNA  analysis 

Labelled  fractions  from  both  Prochiorococcus  and  Synecho- 
coccus  grazers  were  tested  for  the  presence  of  1 6S  rRNA 
chloroplast  sequences.  Specific  oligonucleotides  against 
chloroplast  sequences  (SSRef  release  90  1 2.05.  2007,  SILVA 
database  project  http://www.arb-silva.de/)  were  design  using 
the  Design  Probes  tool  from  the  ARB  software  (Ludwig  etal., 
2004).  Although  a  total  of  1 6  sets  of  primers  were  used  in  the 
experiment  (Table  SI),  only  the  set  of  primers  15F  (5'- 
TTAACTCAAGTG  GCGGACGG)  and  15R  (AGTGTTAG 
TAATAGCCCAGTA)  gave  a  PCR  product  long  enough  to  be 
sequenced.  PCR  reactions  were  performed  using  Taq  DNA 
polymerase  from  NEB  and  primers  at  2  mM  concentration. 
After  5  min  at  95‘C:,  40  cycles  of  denaturation  (95‘C,  45  s), 


©  2008  The  Authors 

Journal  compilation  ©  2008  Society  for  Applied  Microbiology  and  Blackwell  Publishing  Ltd,  Environmental  Microbiology,  11,  512-525 


226 


522  J.  Fhas-Lopez,  A.  Thompson,  J.  Waldbauer  and  S.  \N.  Chisholm 


annealing  (56°C,  1  min),  elongation  (72°C,  1  min)  and  a  final 
elongation  step  (72  °C,  10  min)  were  run  in  a  MJ  Research 
PTC  100  Thermal  Cycler.  PCR  products  were  clean  up  using 
a  QIAquick  PCR  purification  kit  (Qiagen,  Valencia,  CA,  USA) 
and  cloned  into  TOPO  TA  cloning  vector  (Invitrogen,  Carls¬ 
bad,  CA,  USA)  and  sequenced  as  described  above. 

Nucleotide  sequence  accession  numbers 

Ribosomal  RNA  sequences  have  been  deposited  at 
GenBank/EMBL  under  Accession  Nos  EF695076-EF695247 
and  EU499951-EU500232. 
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Supporting  information 

Additional  Supporting  Information  may  be  found  in  the  online 
version  of  this  article: 

Fig.  SI.  Phylogenetic  analysis  of  the  sequences  derived 
from  the  18S  rDNA  sequences  in  the  indigenous  microbial 
community.  Unrooted  phylogenetic  tree  inferred  by  maximum 
likelihood  (ML)  analysis.  A  total  of  1090  positions  used, 
including  gaps  (sequences  ranging  from  a  minimum  of 
585  bp  up  to  893  bp),  from  an  alignment  of  203  partial 
sequences  were  used.  Bootstrap  values  over  50%  are  indi¬ 
cated  on  the  internal  branches  obtained  from  both  ML,  neigh¬ 
bour  joining-distanoe  methods  (NJ-Dis)  and  using  maximum 
parsimony  (MP)  (in  the  order  ML/NJ-Dist/MP).  Bootstrap 
values  <  50%,  or  not  supported  at  least  in  two  of  the  analy¬ 
ses,  have  been  omitted.  The  gamma  distribution  parameter 
(a)  was  estimated  at  0.520;  and  the  proportion  of  invariant 
sites  (1)  was  0.015.  The  scale  bar  indicates  10%  divergence. 
The  sequences  from  the  duplicate  biological  samples  are 
indicated  as  (dhoti)  and  (dhot2).  Classification  is  based  on 
Adi  and  colleagues  (2005)  and  Not  and  colleagues  (2007).  All 
groups  correspond  to  first  and  second  rank  acoording  to  Adi 
and  colleagues  (2005)  exoept  ‘Super-group  and  “Phylum 
(Shalchian-Tabrizi  etat.,  2006). 

Fig.  S2.  Phylogenetio  analysis  of  the  sequences  derived 
from  the  18S  rDNA  sequences  in  the  unlabelled  fractions 
from  the  experiments  (time  0  and  blue  circles  in  Fig.  2). 
Unrooted  SSU  rRNA-derived  library  phylogenetic  tree  of 
eukaryotes  inferred  by  maximum  likelihood  (ML)  analysis.  A 
total  of  1058  positions  used,  including  gaps  (sequences 
ranging  from  a  minimum  of  512  bp  up  to  886  bp),  from  an 
alignment  of  200  partial  sequences  were  used.  Bootstrap 
values  over  50%  are  indicated  on  the  internal  branohes 


obtained  from  both  ML,  neighbour  joining-distance  methods 
(NJ-Dis)  and  using  maximum  parsimony  (MP)  (in  the  order 
ML/NJ-Dist/MP).  Bootstrap  values  <  50%,  or  not  supported  at 
least  in  two  of  the  analyses,  have  been  omitted.  The  gamma 
distribution  parameter  (a)  was  estimated  at  0.594;  and  the 
proportion  of  invariant  sites  (/)  was  0.000.  The  scale  bar 
indicates  1 0%  divergence.  The  sequences  coming  from  the 
duplicate  biological  samples  is  indicated  as  (A)  and  (B). 
Clones  oolour  code;  dark  red;  sequences  from  the  time  0 
sample,  representing  the  metabolically  active  initial  eukary¬ 
otic  microbial  community;  purple:  sequences  from  the  unla¬ 
belled  eukaryotic  RNA  obtained  from  the  samples  incubated 
with  Prochlorococcus',  orange:  sequences  from  the  unla¬ 
belled  fraction  from  the  samples  incubated  with  Synechococ¬ 
cus.  Classification  is  based  on  Adi  and  colleagues  (2005)  and 
Not  and  colleagues  (2007).  All  groups  oorrespond  to  first 
and  second  rank  acoording  to  Adi  and  colleagues  (2005) 
exoept  when  noted  as  follows:  ‘Super-group  and  “Phylum 
(Shalohian-Tabrizi  etal,  2006).  “‘Unidentified  chloroplas- 
tida,  BLAST  results  gave  no  clear  match  and  the  sequences 
did  not  cluster  clearly  with  any  of  the  seoond-rank  groups 
used  in  the  tree  that  could  indicate  the  exact  affiliation  of  the 
sequence. 

Fig.  S3.  Phylogenetic  analysis  of  the  sequences  derived 
from  the  labelled  18S  rDNA  sequences  (red  circles  in  Fig.  2) 
from  the  experimental  bottles  amended  with  labelled  Prochlo¬ 
rococcus  ceWs.  Unrooted  18S  rRNA-derived  library  phyloge¬ 
netic  tree  of  eukaryotes  inferred  by  maximum  likelihood  (ML) 
analysis.  A  total  of  1145  positions  used,  including  gaps 
(sequences  ranging  from  a  minimum  of  545  bp  up  to  980  bp), 
from  an  alignment  of  1 92  partial  sequences  were  used.  Boot¬ 
strap  values  over  50%  are  indicated  on  the  internal  branohes 
obtained  from  both  ML,  neighbour  joining-distance  methods 
(NJ-Dis)  and  using  maximum  parsimony  (MP)  (in  the  order 
ML/NJ-Dist/MP).  Bootstrap  values  <  50%,  or  not  supported  at 
least  in  two  of  the  analyses,  have  been  omitted.  The  gamma 
distribution  parameter  (a)  was  estimated  at  0.512;  and  the 
proportion  of  invariant  sites  (/)  was  0.000.  The  scale  bar 
indicates  10%  divergence.  The  sequences  coming  from  the 
duplioate  biological  samples  is  indicated  as  (A)  and  (B).  Clas¬ 
sification  is  based  on  Adi  and  colleagues  (2005)  and  Not  and 
oolleagues  (2007).  All  groups  correspond  to  first  and  second 
rank  according  to  Adi  and  colleagues  (2005)  exoept  when 
noted  as  follows;  ‘Super-group  and  “Phylum  (Shalohian- 
Tabrizi  etal.,  2006).  “‘Unidentified  stramenopiles,  blast 
results  gave  no  clear  match  and  the  sequences  did  not 
oluster  clearly  with  any  of  the  second-rank  groups  used  in  the 
tree  that  oould  indicate  the  exact  affiliation  of  the  sequence. 
Fig.  S4.  Phylogenetic  analysis  of  the  sequences  derived 
from  the  labelled  18S  rDNA  sequences  (red  circles  in  Fig.  2) 
from  the  experimental  bottles  amended  with  labelled  Syn¬ 
echococcus  ceWs.  Unrooted  18S  rRNA-derived  library  phylo¬ 
genetic  tree  of  eukaryotes  inferred  by  maximum  likelihood 
(ML)  analysis.  A  total  of  1156  positions  used,  inoluding  gaps 
(sequences  ranging  from  a  minimum  of  507  bp  up  to  977  bp), 
from  an  alignment  of  1 88  partial  sequences  were  used.  Boot¬ 
strap  values  over  50%  are  indicated  on  the  internal  branohes 
obtained  from  both  ML,  neighbour  joining-distanoe  methods 
(NJ-Dis)  and  using  maximum  parsimony  (MP)  (in  the  order 
ML/NJ-Dist/MP).  Bootstrap  values  <  50%,  or  not  supported  at 
least  in  two  of  the  analyses,  have  been  omitted.  The  gamma 
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distribution  parameter  (a)  was  estimated  at  0.543;  and  the 
proportion  of  invariant  sites  (/)  was  0.036.  The  scale  bar 
indicates  10%  divergence.  The  sequences  coming  from  the 
duplicate  biological  samples  is  indicated  as  (A)  and  (B).  Clas¬ 
sification  is  based  on  Adi  and  colleagues  (2005)  and  Not  and 
colleagues  (2007).  All  groups  correspond  to  first  and  second 
rank  according  to  Adi  and  colleagues  (2005)  except  when 
noted  as  follows:  'Super-group  and  "Phylum  (Shalchian- 
Tabrizi  etal.,  2006).  "'Unidentified  stramenopiles,  blast 
results  gave  no  clear  match  and  the  sequences  did  not 
cluster  clearly  with  any  of  the  second-rank  groups  used  in  the 
tree  that  could  indicate  the  exact  affiliation  of  the  sequence. 
Fig.  S5.  Phylogenetic  tree  16S  rRNA  sequences  from  chlo- 
roplasts  and  bacteria  inferred  by  maximum  likelihood  (ML) 
analysis.  Blue:  cyanobacterial  16S  rRNA  sequences.  Green: 
sequences  originating  from  the  labelled  fraction  of  the 
Prochlorococcus  inoculation  experiment.  Orange:  sequences 
originated  from  the  labelled  fraction  from  the  Synechococcus 
inoculation  experiment.  (A)  and  (B)  represent  the  two  biologi¬ 
cal  replicates  in  the  experiments.  A  total  of  724  positions  were 


used,  including  gaps  (sequences  ranging  from  a  minimum 
of  358  bp  up  to  668  bp),  from  an  alignment  of  103  partial 
sequences.  Bootstrap  values  over  50%  are  indicated  on  the 
internal  branches  obtained  from  ML,  neighbour  joining- 
distance  methods  (NJ-Dist)  and  using  maximum  parsimony 
(MP)  (in  the  order  ML/NJ-Dist/MP).  Bootstrap  values  <  50%, 
or  not  supported  at  least  in  two  of  the  analyses,  have  been 
omitted.  The  proportion  of  invariant  sites  (/)  was  0.241.  The 
scale  bar  indicates  1 0%  divergence.  An  archaeal  sequence 
was  used  as  out-group  {Sulfolobus  acidocaldarius). 

Table  SI.  Oligonucleotides  used  for  the  amplification  of 
16S  rRNA  chloroplast  genes  from  different  groups  defined 
based  on  the  ARB  tree  (SSRef  release  90  12.05.2007)  for 
these  group  of  sequences.  F,  forward  primer.  R,  reverse 
primer. 

Please  note:  Wiley-Blackwell  are  not  responsible  for  the 
content  or  functionality  of  any  supporting  materials  supplied 
by  the  authors.  Any  queries  (other  than  missing  material) 
should  be  directed  to  the  corresponding  author  for  the  article. 
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Figure  S1  (Cent,  next  page) 
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Figure  S4  (Cent,  next  page) 
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Table  S1.  Oligonucleotides  used  for  the  amplification  of  16S  rRNA  chloroplast  genes  from 
different  groups  defined  based  on  the  ARB  tree  (SSRef  release  90  12.05.2007)  for  these  group 
of  sequences.  F,  forward  primer.  R,  reverse  primer. 


Primers’ 

set 

number 

Group  targeted 

Sequence 

Expeoted  size 
of  amplified 
produot  in  bp 

1 

Zea  and  rel.-F 

Zea  and  rel.-R 

GAA  GTG  GTG  TTT  CCA  GTG  GC 
AAA  AGA  AGT  TCA  CGA  CCC  GT 

351 

2 

Chlamydomonas  and  rel.-F 
Chlamydomonas  and  rel.-R 

ACA  CGT  CAA  CGC  ACG  AGC  TG 
TAG  CTA  GTT  GGT  GGG  GGT  AA 

821 

3 

Anthocerus  and  rel.-F 
Anthocerus  and  rel.-R 

TAA  GGA  GGG  GCT  TGC  GTT  TG 
GTC  ATT  GCT  TCT  TCT  CTA  AG 

218 

4 

Marchantia  and  rel.-F 
Marchantia  and  rel.-R 

CCT  TTT  CTC  AGA  GAA  GAT  GC 
GCG  AGG  TCG  CGA  CCC  TTT  GT 

813 

5 

Spirogyra  and  rel.-F 

Spirogyra  and  rel.-R 

TAG  TCT  CCA  CCG  CCT  GGC  CA 
GGC  GGG  GGA  CCA  CCA  CTG  GA 

485 

6 

Palmaria_1  and  rel.-F 
Palmaria_1  and  rel.-R 

CGC  CTT  AGC  TAC  GAT  ACT  GC 
AGA  CGA  CAG  CTA  GGG  GAG  CA 

89 

7 

Ochromonas  and  rel.-F 
Ochromonas  and  rel.-R 

CCA  CCT  GTG  TAA  GAG  GCC  GT 
GGA  AGA  TCT  GAC  GTT  ACT  TG 

581 

8 

Palmaria_2  and  rel.-F 
Palmaria_2  and  rel.-R 

CTA  CGA  TAC  TGC  ACG  GAT  CG 
TGG  GAA  GAA  CAC  CAG  AAG  CG 

136 

9 

Chara  and  rel.-F 

Chara  and  rel.-R 

GCA  CTG  AAC  GGA  TCA  AAT  CG 
AAG  GAC  TTG  CCC  TTG  GGT  GG 

712 

10 

Bryophyta  and  rel.-F 

Bryophyta  and  rel.-R 

GGA  GCG  AAA  GGA  GGA  ATC  CA 
ACG  CAA  GCC  CCT  CCT  TGG  GT 

31 

11 

Chlorella  and  rel.-F 

Chlorella  and  rel.-R 

CCC  CAG  GCG  GGA  TAC  TTC  ACG 
GGG  AGG  AAC  ACC  AAA  GGC  GA 

160 

12 

Chlorarachnion  and  rel.-F 
Chlorarachnion  and  rel.-R 

AGC  GAG  GGG  AGA  GAA  TGG  GA 
GCC  CAG  AAC  TTA  AGG  GGC  AT 

436 

13 

Osmunda  and  rel.-F 

Osmunda  and  rel.-R 

AGC  AAA  AGG  GAG  GGA  TCC  GC 
TTG  ACA  GCG  GAC  TTA  AGG  AG 

398 

14 

Chaetosphaeridium  and  rel.-F 
Chaetosphaeridium  and  rel.-R 

CTT  GCG  TCT  GAT  TAT  GCT  AG 
ACC  CGT  AAG  CTT  TCT  TCC  T 

175 

15 

Skeletonema  and  rel.-F 
Skeletonema  and  rel.-R 

TTA  ACT  CAA  GTG  GCG  GAC  GG 
AGT  GTT  AGT  AAT  AGC  CCA  GTA 

650 

16 

Euglenales  and  rel.-F 
Euglenales  and  rel.-R 

GGG  GAG  TAC  GCT  TGC  AAA  AG 
CAT  GCA  CCA  CCT  GTG  TCT  AG 

153 

244 


Appendix  B 

Improved  methods  for  isolating  and  validating  indigenous 
biomarkers  in  Precambrian  rocks 

Laura  S.  Sherman,  Jacob  R.  Waldbauer  and  Roger  E.  Summons 


Reprinted  with  permission  from  Organic  Geochemistry 
©  2007  Elsevier  Ltd. 

Sherman,  L.S.,  Waldbauer,  J.R.  and  Summons,  R.E.  (2007)  Improved  methods  for 
isolating  and  validating  indigenous  biomarkers  in  Precambrian  rocks.  Organic 
Geochemistry  38:  1987-2000. 


245 


246 


ELSEVIER 


Available  online  at  www.sciencedirect.com 


ScienceDirect 


Organic 

Geochemistry 


Organic  Geochemistry  38  (2007)  1987-2000 

www.elsevier.com/locate/orggeochem 


Improved  methods  for  isolating  and  validating 
indigenous  biomarkers  in  Precambrian  rocks 

Laura  S.  Sherman  Jacob  R.  Waldbauer  Roger  E.  Summons 

Department  of  Earth.  Atmospheric  and  Planetary  Sciences,  Massachusetts  Institute  of  Technology,  Cambridge,  MA  02139,  USA 
^  Joint  Program  in  Chemical  Oceanography,  Massachusetts  Institute  of  Technology  and  Woods  Hole  Oceanographic  Institution, 

Cambridge,  MA  02139,  USA 

Received  24  July  2007;  accepted  31  August  2007 
Available  online  14  September  2007 


Abstract 

Hydrocarbon  biomarkers  in  highly  mature  Precambrian  rocks  have  the  potential  to  provide  important  information 
about  the  diversity  ecology,  and  evolution  of  early  life,  but  studying  them  presents  special  analytical  challenges.  Extract- 
able  hydrocarbons  are  present  in  Archean  and  most  Paleoproterozoic  sedimentary  rocks  in  such  trace  concentration  that 
even  slight  contamination  from  petroleum-derived  materials  in  situ  or  during  drilling,  storage,  sampling,  handling  and  lab¬ 
oratory  analysis  would  compromise  the  results  and,  thereby,  any  consequent  inferences.  Here  we  report  protocols  that  we 
have  developed  for  the  analysis  of  cores  from  several  recently  completed  deep-time  scientific  drilling  initiatives.  By  paying 
special  attention  to  cutting,  cleaning,  crushing  and  extraction,  it  is  possible  to  significantly  reduce  laboratory  blanks  to 
acceptable  levels.  When  these  methods  are  utilized,  meaningful  variations  in  the  patterns  of  biomarkers  over  stratigraphic 
and  lithologic  boundaries  provide  compelling  evidence  for  syngeneity. 

©  2007  Elsevier  Ltd.  All  rights  reserved. 


1.  Introduction 

Molecular  fossils  of  biogenic  lipids  (i.e.,  bio¬ 
markers)  preserved  over  geologic  time  in  sedimen¬ 
tary  rocks  of  low  metamorphic  grade  can  be  used 
to  describe  ancient  microbiota  and  environmental 
conditions  (Eglinton  et  al.,  1964;  Brocks  et  al., 
1999,  2003b).  Because  biomarkers  are  the  hydrocar¬ 
bon  skeletons  of  biogenic  precursor  lipids,  those 
that  are  created  through  known  biosynthetic  and 
diagenetic  pathways  convey  information  about  the 
identity  and  physiology  of  their  source  organisms 
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(Brocks  and  Summons,  2003;  Brocks  and  Pearson, 
2005).  Biomarker  geochemistry  is  therefore  a 
potentially  powerful  tool  for  establishing  the  record 
of  ancient  life  on  Earth  and  for  better  understand¬ 
ing  early  microbial  evolution.  However,  the  applica¬ 
tion  of  biomarker  techniques  to  highly  mature 
Precambrian  organic  matter  presents  unique 
challenges,  requires  the  use  of  special  methods  and 
necessitates  extraordinary  attention  to  concerns  of 
contamination. 

Although  some  Precambrian  rocks  contain  abun¬ 
dant  amounts  of  syndepositional  organic  matter 
(exeeeding  10  wt%  in  some  samples  in  this  study), 
very  little  is  in  the  form  of  extractable  bitumen. 
Over  billions  of  years,  even  under  relatively  low 
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grade  metamorphic  conditions,  the  majority  of  the 
organic  constituents  of  a  sediment  are  incorporated 
into  the  macromolecular  kerogen  network  (Brocks 
and  Summons,  2003).  Because  very  few  hydrocar¬ 
bons  are  present  as  extractable  biomarker  molecules 
(sub-ppb  quantities),  even  low  level  contamination 
introduced  during  drilling,  sampling,  storage,  or 
laboratory  processing  may  overprint  the  original 
signatures. 

The  first  purportedly  Precambrian  biomarkers 
were  amino  acids  identified  by  P.H.  Abelson  in  the 
late  1950s  (Barghoorn,  1957);  numerous  studies  in 
the  1960s  subsequently  reported  finding  fatty  acids, 
amino  acids,  «-alkanes,  porphyrins  and  acyclic  iso- 
prenoids  in  Precambrian  rocks  (McKirdy,  1974; 
Hayes  et  al.,  1983;  Imbus  and  McKirdy,  1993). 
However,  these  findings  were  quickly  called  into 
question  (Hoering,  1966;  Leventhal  et  al.,  1975). 
Hoering  (1966)  noted  that  most  high  maturity 
Archean  rocks  had  thermal  histories  inconsistent 
with  the  preservation  of  soluble  organic  molecules. 
He  also  detailed  numerous  sources  of  contamina¬ 
tion,  including  anthropogenic  petroleum  products, 
laboratory  equipment  and  sample  storage  bags 
(Hoering,  1966;  Hoering,  1967).  Furthermore,  sub¬ 
sequent  reports  described  the  potential  for  younger 
oils  and  drill  fluids  to  penetrate  and  be  preserved  in 
Precambrian  units  (Barghoorn  et  al.,  1965;  Mei- 
nschein,  1965;  Nagy,  1970;  Sanyal  et  al.,  1971). 
These  problems  overshadowed  Precambrian  organic 
geochemistry  investigations  for  almost  thirty  years. 

More  recent  studies,  including  those  by  Brocks 
(2001),  Brocks  et  al.  (1999),  Brocks  and  Summons 
(2003),  Eigenbrode  (2004)  and  Dutkiewicz  et  al. 
(2006)  have  re-visited  Precambrian  organic  geo¬ 
chemistry  with  extreme  care,  in  order  to  assess  the 
syngeneity  of  detected  molecules.  The  current  study 
was  undertaken  to  both  gain  a  better  understanding 
of  early  microbial  evolution  and  develop  a  protocol 
for  sampling,  handling,  and  analyzing  high  maturity 
Precambrian  organic  matter. 

We  examined  rocks  from  two  cores  (GKPOl  and 
GKFOl,  hereafter  GKP  and  GKF)  drilled  as  part  of 
the  Agouron  Griqualand  Drilling  Project  and  rocks 
from  one  core  (ABDP9)  drilled  as  part  of  the  Deep 
Time  Drilling  Project  of  the  NASA  Astrobiology 
Drilling  Program.  These  sedimentary  successions 
contain  some  of  the  earliest  evidence  of  the  accumu¬ 
lation  and  cycling  of  molecular  oxygen  in  surface 
environments  (Bekker  et  al.,  2004;  Anbar  et  al., 
2007;  Kaufman  et  al.,  2007).  The  Agouron  cores 
were  each  drilled  without  the  use  of  hydrocarbon 


fluids  through  more  than  a  kilometer  of  Archean 
sediments.  The  cores  intersect  platform  and  slope 
facies  of  the  Transvaal  Supergroup  and  were  depos¬ 
ited  on  the  Kaapvaal  Craton  of  South  Africa 
between  2670-2460  Ma  (Armstrong  et  al.,  1986; 
Barton  et  al.,  1994;  Walraven  and  Martini,  1995; 
Sumner  and  Bowring,  1996).  Stratigraphic  correla¬ 
tions  between  the  two  Agouron  cores,  based  on 
the  presence  of  volcanic  ash  beds,  spherule  layers, 
flooding  surfaces,  sequence  boundaries  and  litho¬ 
logic  marker  beds  (Schroder  et  al.,  2006),  enabled 
intercomparison  of  the  biomarker  contents  of  corre¬ 
lated  beds.  Core  ABDP9  was  drilled  through  the 
Pilbara  Craton  of  western  Australia,  intersecting 
iron  formations,  shales,  mixed  elastics  and  carbon¬ 
ates.  Although  drilling  difficulties  required  the  use 
of  petroleum-based  additives  part  way  through  the 
coring,  the  top  200  m  were  obtained  with  minimal 
contamination.  These  two  drilling  projects  are 
among  the  first  efforts  to  recover  long  stretches  of 
Archean  strata  with  the  express  purpose  of  collect¬ 
ing  samples  for  molecular  fossil  analysis. 

2.  Methodology:  sample  collection  and  processing 

The  methods  were  developed  for  the  extraction 
and  analysis  of  Precambrian  hydrocarbons  with 
minimal  contamination.  The  methodology  is  based 
on  techniques  described  by  Brocks  (2001),  Brocks 
et  al.  (2003a, b)  and  Eigenbrode  (2004),  with  refine¬ 
ments  as  described.  Rationalization  of  key  aspects 
of  the  methods  can  be  found  in  Section  4. 

2.1.  Drilling 

As  described  above,  the  data  presented  in  this 
study  are  results  of  analysis  of  rock  samples  from 
the  Agouron  Griqualand  Drilling  Project  and  the 
Deep  Time  Drilling  Project.  Drill  cores  have  several 
advantages  over  outcrop  samples  for  the  purposes 
of  biomarker  analysis.  The  surface  exposures  of 
early  Precambrian  terranes  are  generally  heavily 
weathered,  meaning  they  have  been  exposed  to  cir¬ 
culating  modern  surface  water  and  groundwater 
that  carry  contaminant  lipids  and  hydrocarbons. 
Drilling  below  the  modern  weathering  horizon  - 
which  can  be  tens  to  hundreds  of  meters  deep  - 
affords  rock  samples  that  have  not  been  in  contact 
with  fluids  in  the  recent  geologic  past.  Drill  cores 
also  sample  long  stretches  of  continuous  stratigra¬ 
phy,  generally  much  longer  intervals  than  can  be 
accessed  in  often  sparse  surface  exposure.  Impor- 
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tantly,  drill  cores  represent  a  group  of  samples  taken 
at  one  time  that  have  all  been  subject  to  the  same 
handling  and  storage  conditions.  This  facilitates 
stratigraphie  comparisons  of  biomarker  eontents, 
which  can  be  valuable  both  for  both  assessing  the 
syngeneity  of  the  extraeted  hydrocarbons  and  estab¬ 
lishing  records  of  paleoenvironmental  change. 

While  deep  core  drilling  offers  several  key 
advantages  in  sample  recovery  for  Precambrian 
biomarker  analysis,  other  potentially  complicating 
factors  must  be  kept  in  mind.  Individual  drill  cores 
necessarily  represent  a  finite  amount  of  material 
deposited  at  each  time  interval  and  cannot  be  used, 
as  outcrop  samples  often  can,  to  determine  the  lat¬ 
eral  variability  of  a  stratigraphic  unit.  Because  these 
projects  are  costly  and  time  consuming,  it  is  often 
not  practical  to  drill  a  series  of  cores  in  a  small  area. 
A  central  concern  when  studying  drill  cores  is  that 
the  samples  will  necessarily  have  come  into  contact 
with  drilling  equipment  and  fiuids.  It  is  essential 
that  the  hydrocarbon-based  lubricants  typically 
used  in  diamond-bit  core  drilling  be  eschewed,  ide¬ 
ally  in  favor  of  pure  water  as  a  drilling  fiuid.  Use 
of  water  as  the  sole  drilling  fluid  is  difficult,  and  gen¬ 
erally  increases  rate  of  equipment  fatigue  and  fail¬ 
ure.  Nevertheless,  the  extremely  low  quantities  of 
extractable  bitumen  in  high  maturity  Precambrian 
rocks  mean  that  the  syngenetic  signal  can  easily  be 
overprinted  by  even  minimal  hydrocarbon  content 
in  the  drilling  fluid.  Any  additive  or  substance  to 
be  put  into  the  borehole  should  be  pre-screened 
for  hydrocarbon  and  biomarker  content.  Some 
commercial  products,  such  as  those  based  on  plant 
oils,  are  less  likely  to  be  confused  with  indigenous 
Precambrian  biomarkers  because  they  are  com¬ 
posed  primarily  of  polar  compounds  and  display 
immature  isomer  distributions.  Other  products, 
advertised  as  ‘hydrocarbon  free’,  often  contain 
quantities  of  petroleum-derived  biomarkers  that 
could  compromise  trace  level  analysis.  Even  for 
cores  drilled  under  the  cleanest  practicable  condi¬ 
tions,  we  have  found  that  the  outer  surfaces  of  core 
pieces  bear  signals  of  low  level  contamination, 
necessitating  their  removal  (see  Sections  2.3  and 
4.1). 

From  hundreds  of  meters  of  drill  core,  only  a 
small  subset  of  the  recovered  stratigraphy  can  be 
feasibly  analyzed.  We  chose  our  sub-samples  for 
biomarker  work  on  the  basis  of  several  criteria. 
Originally,  visibly  organic-rich  strata  (primarily  fine 
grained  black  shales,  often  with  abundant  pyrite) 
were  sampled  because  we  expected  that  they  would 


yield  the  most  extractable  bitumen.  Ultimately,  we 
found  little  correlation  between  bitumen  yield  and 
whole  rock  organic  matter  content,  with  more  car¬ 
bonate-rich  beds  sometimes  having  as  much  extract- 
able  hydrocarbon  as  shales  (Waldbauer  et  ah,  2007). 
Hence,  we  consider  it  useful  to  sample  all  lithofacies 
intersected  in  a  given  core,  particularly  because  con¬ 
trasts  between  siliciclastic  and  carbonate-dominated 
depositional  environments  are  a  potentially  strong 
paleoenvironmental  signal  (Eigenbrode,  2004).  Sim¬ 
ilarly,  we  sampled  across  formation  and  sequence 
boundaries  to  explore  how  changes  in  sediment 
sourcing  and  deposition  might  be  recorded  in  bio¬ 
marker  content. 

2.2.  Materials 

In  general,  it  is  important  to  minimize  the  num¬ 
ber  of  external  surfaces  that  touch  Archean  rock 
samples.  Each  surface  contacted  represents  an 
opportunity  for  contaminant  hydrocarbons  to  be 
introduced  into  the  samples.  Surfaces  that  samples 
do  contact  are  either  cleaned  exhaustively  with 
organic  solvents  or  thoroughly  combusted.  Addi¬ 
tionally,  materials  used  to  process  high  maturity 
samples  are  reserved  only  for  these  samples  and 
are  kept  separate  from  general  laboratory  supplies. 

Laboratory  solvents  including  hexane,  dichloro- 
methane  (DCM)  and  methanol  (MeOH)  are  high 
purity  and  organic  free  (OmniSolv,  EMD  Chemi¬ 
cals).  Prior  to  use,  glassware,  glass  wool,  aluminum 
foil  and  silica  gel  are  combusted  at  550  °C  (8  h)  and 
quartz  sand  (Accusand,  Unimin  Corp.)  is  com¬ 
busted  at  850  °C  (12  h).  De-ionized  (DI)  water  and 
hydrochloric  acid  (HCl)  used  in  sample  processing 
are  extracted  (x5)  with  DCM.  Metal  tools  used  to 
process  samples  are  rinsed  with  DI  water  and 
cleaned  (x5  each)  with  MeOH,  DCM  and  hexane. 
Crushing  tools  (see  Section  2.3)  are  scrubbed  with 
combusted  quartz  sand,  rinsed  with  DI  water  and 
sonicated  for  30  min  each  in  MeOH  and  DCM. 

2.3.  Sample  preparation 

Quarter  core  samples  (1/4  NQ  size,  ~  47.6  mm 
whole  core  diameter)  were  approximately  50  to 
200  g  in  weight  and  20  to  50  cm  in  length.  They  were 
processed  in  batches  of  six  with  at  least  one  proce¬ 
dural  sand  blank  per  set.  Samples  were  labeled 
according  to  core  name  and  downcore  depth  in 
meters  (e.g.,  GKF  230).  We  found  that  foreign  con¬ 
taminants  are  present  on  the  outside  surfaces  of  the 


249 


1990 


L.S.  Sherman  et  al.  /  Organic  Geochemistry  38  (2007)  1987-2000 


cores  and  that  it  is  necessary  to  remove  at  least  the 
outer  3  to  5  mm  of  core  material  to  remove  these 
hydrocarbons  (see  Section  4.1).  To  do  this,  samples 
are  cut  using  a  table  saw  with  a  water  lubricated, 
diamond  edge  blade  (UKAM).  DCM-extracted  DI 
water  is  used  as  the  blade  lubricant.  Between  sam¬ 
ples  the  saw  is  washed  in  DI  water  and  the  blade 
is  washed  and  sonicated  for  20  min  each  in  MeOH 
and  DCM.  After  cutting,  to  remove  potential  surfi- 
cial  contaminants,  the  trimmed  core  pieces  are  son¬ 
icated  for  20  min  each  in  DCM-extracted  DI  water, 
MeOH,  and  DCM. 

After  cutting  the  samples,  it  is  necessary  to  crush 
them  to  a  fine  powder  prior  to  extraction  (see  Section 
4.1.1).  We  first  crush  samples  to  pieces  of  ca.  1/4” 
diameter  using  a  300  series  stainless  steel  mortar 
and  pestle  (MIT  Machine  Shop  design.  Fig.  la). 
Samples  are  then  crushed  to  a  fine  powder  (sub- 
140  mesh)  using  a  stainless  steel  puck  mill  modeled 
after  a  SPEX  8507  steel  puck  mill  (MIT  Machine 
Shop  design.  Fig.  lb).  Between  samples,  the  mortar, 
pestle  and  puck  mill  are  scrubbed  with  fired  sand  to 
remove  particulate  matter,  washed  with  DI  water 
and  sonicated  for  30  min  each  in  MeOH  and  DCM. 

2.4.  Extraction  and  fractionation 

After  crushing,  the  powdered  samples  (40-90  g) 
are  divided  into  60  ml  vials  (ca.  15  g  powder  per 
vial).  DCM  is  added  to  each  vial  (ca.  25-30  ml), 
the  vials  are  sonicated  for  30  min  and  the  bitumen 
is  collected.  More  solvent  is  added  and  the  process 
is  repeated  (see  Section  4.2).  It  is  then  necessary  to 
filter  the  extracts  over  a  small  amount  of  silica  gel 


(ca.  3  cm)  in  a  wide  bore  glass  column  to  remove 
particulates.  The  bitumens  are  next  treated  with 
acid-activated  copper  shot  (Alfa  Aesar)  to  remove 
elemental  sulfur  and  are  loaded  on  to  Pasteur  pip¬ 
ette  silica  columns  (ca.  8  cm  of  silica).  The  satu¬ 
rated,  aromatic  and  polar  hydrocarbons  are 
separated  using  hexane  (1  column  volume),  hex- 
ane/DCM  (4  column  volumes,  8/2)  and  DCM/ 
MeOH  (4  column  volumes,  7/3),  respectively. 

2.5.  Preparation  of  bitumen  II 

To  prepare  demineralized  powder,  ca.  30  g  of 
extracted  rock  powder  is  placed  in  Teflon  tubes  which 
are  cleaned  of  organics  first  in  aqua  regia  and  then 
with  MeOH  and  DCM  (ca.  15  g  powder  per  tube). 
DCM-extracted  HCl  (6  N)  is  added  and,  after  agita¬ 
tion,  the  acid  is  left  to  react  for  24-48  h  to  remove  car¬ 
bonates.  After  pouring  off  the  HCl,  48%  hydrofluoric 
acid  (HF)  is  added  for  at  least  72  h  to  dissolve 
silicates.  Finally,  after  the  HF  is  decanted,  6  N 
DCM-extracted  HCl  is  again  added  to  remove  fluo¬ 
ride  precipitates.  Throughout  this  process,  to  aid  acid 
digestion,  the  tubes  are  periodically  agitated.  After 
24  h  in  HCl,  the  samples  are  washed  at  least  x5  with 
DCM-extracted  water.  Once  dry,  the  powders  are 
extracted  using  sonication  as  described  above  and 
the  resulting  bitumen  is  analyzed. 

2.6.  Gas  chromatography  ( GC-MS)  and  metastable 
reaction  monitoring  ( MRM)  mass  spectrometry 

After  separation,  the  fractions  are  carefully  dried 
to  a  volume  of  roughly  80  pi.  Standards  are  added 


Fig.  1.  Photographs  of  rock  crushing  equipment,  (a)  Stainless  steel  mortar  and  pestle  with  pencil  for  scale,  (b)  Stainless  steel  puck  mill  with 
pencil  for  scale. 
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as  follows:  10  ng  D4  (D4-Qioia-ethylcholestane, 
Chiron)  and  1  |ig  aiC22  (3-methylheneicosane, 
99+%,  ULTRA  Scientific)  to  the  saturated  fraction 
and  413  ng  D14  (Di4-^affl-terphenyl,  98  at.%  D, 
Cambridge  Isotope)  to  the  aromatic  fraction. 

The  saturated  fractions  are  analyzed  using  GC- 
MS  in  full  scan  mode  and  metastable  reaction  mon¬ 
itoring  GC-MS  (MRM-GC-MS-MS;  Table  1).  The 
aromatic  fractions  are  analyzed  using  GC-MS  with 
selected  ion  monitoring.  All  the  analyses  are  con¬ 
ducted  with  a  Micromass  AutoSpec  Ultima  coupled 
to  an  Agilent  6890  N  gas  chromatograph,  fitted  with 


Table  1 

MRM-GC-MS-MS  precursor  to  product  transitions 


Precursor  mass 
(Da)" 

Product  mass 
(Da)" 

Biomarkers 

262.265 

191.179 

Ci9  Tricyclic  terpane 

276.280 

191.179 

C20  Tricyclic  terpane 

290.296 

191.179 

C21  Tricyclic  terpane 

304.312 

191.179 

C22  Tricyclic  terpane 

318.328 

191.179 

C23  Tricyclic  terpane 

330.328 

191.179 

C24  Tetracyclic 
terpane 

332.344 

191.179 

C24  Tricyclic  terpane 

346.359 

191.179 

C25  Tricyclic  terpane 

360.374 

191.179 

C26  Tricyclic  terpane 

358.353 

217.196 

C26  Steranes 

372.376 

217.196 

C27  Steranes 

386.391 

217.196 

C28  Steranes 

404.432 

221.221 

D4  C29  Sterane*^ 

400.407 

217.196 

C29  Steranes 

414.423 

217.196 

C30  Steranes 

426.421 

205.194 

Me-C3o  Hopanes^^ 

370.359 

191.179 

C27  Hopanes 

440.437 

205.194 

Me-C3i 

Homohopanes 

384.374 

191.179 

C28  Hopanes 

454.452 

205.194 

Me-C32 

Homohopanes 

398.390 

191.179 

C29  Hopanes 

468.467 

205.194 

Me-C33 

Homohopanes 

412.406 

191.179 

C30  Hopanes 

482.482 

205.194 

Me-C34 

Homohopanes 

426.421 

191.179 

C31  Homohopanes 

496.496 

205.194 

Me-C35 

Homohopanes 

440.437 

191.179 

C32  Homohopanes 

454.452 

191.179 

C33  Homohopanes 

468.467 

191.179 

C34  Homohopanes 

482.482 

191.179 

C35  Homohopanes 

^  Precursor  and  product  mass  in  Daltons. 

^  “D4  C29  sterane"  is  the  internal  standard,  D4-o(oto(- 
ethylcholestane. 

“Me-Cjc”  steranes  and  hopanes  are  methylated  versions  of  the 
indicated  compounds. 


a  DB-1  fused  silica  column  (60  mx  0.25  mm  i.d., 
0.25  pm  film  thickness,  J&W  Scientific)  and  using 
He  as  carrier  gas.  The  mass  spectrometer  source 
operates  at  250  °C  under  electron  ionization  condi¬ 
tions  (70  eV)  and  an  accelerating  voltage  of  8  kV  is 
used.  Full  scan  analyses  are  conducted  over  a  range 
of  mjz  50-600.  Peaks  are  integrated  manually  and 
quantified  on  the  basis  of  comparison  to  the  internal 
standards  (using  m/z  85  from  full  scan  analyses). 

3.  Typical  results 

Agouron  Archean  core  samples  ranged  from  34. 1 
to  95.6  g  rock  powder  and  yielded  between  0.31  and 
21.1  pg  of  bitumen  (saturated  hydrocarbons  and 
cyclic  terpenoid  biomarkers).  By  weight,  the  bitu¬ 
men  content  ranges  from  13.9  to  604.6  ppb  whole 
rock  or  0.34  to  23.9  ppm  organic  matter.  All  the 
Archean  bitumens  yielded  broadly  similar  types 
and  abundances  of  hydrocarbons.  The  compound 
classes  detected,  and  their  overall  abundances  and 
relative  amounts  of  rearranged  isomers,  are  charac¬ 
teristic  of  highly  thermally  mature  organic  matter. 
Typical  biomarker  and  saturated  hydrocarbon 
chromatograms  from  a  representative  sample 
(GKF  705.3)  are  presented  in  Fig.  2.  The  Archean 
bitumens  are  comprised  almost  entirely  of  saturated 
and  aromatic  hydrocarbons;  polar  and  partially 
unsaturated  compounds  are  below  detection  limit. 
The  saturated  fractions  are  composed  primarily  of 
low  molecular  weight  M-alkanes  with  condensate¬ 
like  carbon  number  distributions  (Fig.  2b).  The  aro¬ 
matic  fractions  are  dominated  by  low  molecular 
weight  polycyclic  hydrocarbons  and  their  alkylated 
homologs. 

Cyclic  terpenoid  biomarker  compounds  are  pres¬ 
ent  in  ppb  to  sub-ppb  quantities;  cheilanthanes, 
hopanes  and  steranes  are  present  at  roughly  equal 
abundances.  Thermodynamically-favored  isomers 
from  each  of  these  terpenoid  classes  are  found  in 
greater  relative  abundance.  Tricyclic  biomarkers, 
including  13P(H),  14oi(H)  homologs  of  C19  to  C26 
cheilanthanes,  and  a  C24  tetracyclic  terpane  are 
present  in  all  of  the  samples.  Gammacerane  is  also 
found  in  low  abundance  in  many  of  the  samples. 
C27  to  C35  hopanes  of  several  series  including 
17ot(H),  2ip(H)-hopanes,  17P(H),  21a(H)-hopanes 
(moretanes),  2oi-  and  3P-methylhopanes,  25-  and 
30-norhopane,  and  C2g  dinorhopanes  are  present 
(Fig.  2a).  C26  to  C30  steranes  are  also  found  in  every 
sample,  those  with  27,  28  and  29  carbons  being  the 
most  abundant  (Fig.  2a).  Among  these,  5oi(H), 
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Fig.  2.  Typical  results  (from  sample  GKF  705.3).  (a)  Typical  Archean  biomarker  traces.  C27  to  C29  steranes  and  C27,  C29  and  C30  hopane 
traces  are  shown.  Sand  blank  chromatograms  shown  below  each  sample  trace  are  to  the  same  absolute  scaling.  Relative  retention  times  are 
shown  in  min.  Compound  abbreviations:  C3o-aP  =  17a(H),2ip{H)-hopane;  30-nor  =  17a(H),2ip(H)-30-norhomohopane;  C3o-Pa=  17p- 
(H),2lQt(H)-hopane;  C29-aP  =  17a(H),  2ip(H)-30-norhopane;  C29  Ts  =  18o((H),2ip(H)-30-nomeohopane;  C29-Po(  =  17P(H),21o((H)-30- 
norhopane;  Ts  =  C27  18a-trisnorhopane  II;  Tm  =  C27  17a-trisnorhopane;  diasteranes  (205  and  20R  epimers)  =  13P(H),17a(H)  diasterane; 
aoLiy.  (205  and  2QR  epimers)  =  5ct(H),14o((H),17a(H)  sterane;  otPP  (205  and  20R  epimers)  =  5o((H),14P(H),17P(H)  sterane.  (b)  Typical 
Archean  saturated  hydrocarbon  trace  (from  sample  GKF  705.3;  m/z  85).  Internal  standard  (1  ug  3-methylheneicosane)  indicated  by  “c^”. 
Regular  «-alkane  peaks  (C13  to  C24)  indicated  by  Compound  abbreviations:  Pr  =  pristane;  Ph  =  phytane. 


14oi(H),  17P(H)  and  5ot(H),  14P(H),  17P(H)  regular 
steranes  (both  20 R  and  20S  epimers)  and  rear¬ 
ranged  13P(H),  17oi(H)-diasteranes  are  the  domi¬ 
nant  isomers. 

Sand  blanks  are  largely  free  of  hydrocarbons 
and  contain  very  low  abundances  of  terpenoid 
biomarkers  and  «-alkanes.  Average  sample  to 
blank  ratios  were  calculated  for  each  compound 
class  using  the  17  samples  and  four  sand  blanks 
that  were  processed  using  the  techniques  described 
in  Section  2.  Overall,  the  average  sample  to  blank 
ratio  for  cyclic  terpenoid  biomarkers  is  11.58 
(±  7.77,  Icr  SD).  Separated  by  compound  class, 
the  sample  to  blank  ratio  is  lower  for  cheilanthanes 
(4.88  ±1.57,  lf7  SD)  and  higher  for  steranes 
(19.70  ±  15.82,  Iff  SD)  and  hopanes  (16.38  ± 
14.10,  Iff  SD).  This  is  primarily  because  the  sand 
blanks  are  dominated  by  cheilan thane  biomarkers. 
Overall,  the  average  sample  to  blank  ratio  for 
«-alkanes  is  3.67  (±  2.81,  Iff  SD).  This  value 
may  be  lower  than  the  values  for  the  cyclic  bio¬ 
markers  because  «-alkanes  are  particularly 


abundant  and  widespread  in  the  modern  environ¬ 
ment  as  a  result  of  fossil  fuel  burning  and  petro¬ 
chemical  use.  Sand  blank  alkane  yields  (w  =  4) 
ranged  from  83.7  to  805  ng,  whereas  sand  blank  bio¬ 
marker  yields  ranged  from  0.93  to  5.18  ng;  these 
quantities  are  significantly  lower  than  the  sample 
bitumen  yields.  It  is  possible  that  further  refinement 
of  the  methods  could  increase  sample  to  blank  ratio 
values  by  eliminating  low  levels  of  carry-over 
between  samples  and  blanks  and  by  reducing  sample 
contact  with  hydrocarbon  aerosols. 

4.  Justification  of  methodology 

In  developing  the  techniques  described  in  Section 
2,  we  conducted  several  methodological  experiments 
to  test  the  efficacy  of  our  approach.  We  sought 
throughout  to  minimize  the  number  of  sample  pro¬ 
cessing  steps  and  the  amount  of  equipment 
required,  as  each  handling  step  is  an  opportunity 
for  contaminants  to  be  introduced.  This  section  pre¬ 
sents  the  results  of  several  of  these  methodological 
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tests,  demonstrating  the  necessity  of,  and  best  prac- 
tiees  for,  each  sample  preparation  step. 

4.1.  Sample  preparation:  cutting 

Because  anthropogenic  petroleum  products  are 
ubiquitous  in  the  modern  environment,  particu¬ 
larly  around  machinery  and  in  industrialized  set¬ 
tings  sueh  as  research  labs,  even  cores  obtained 
without  the  use  of  drill  fluids  and  analyzed  soon 
after  recovery  may  eontain  foreign  lipids  on  their 
surfaces.  To  determine  if  such  compounds  are 
present  on  the  outsides  of  drill  eore  pieces  and, 
if  so,  how  best  to  eliminate  the  eontamination, 
we  pursued  an  approach  initiated  by  Brocks 
(2001)  by  comparing  the  extraetable  hydrocarbon 
contents  of  exterior  and  interior  portions  of  the 
core  samples. 

We  cut  a  quarter  core  piece  (GKF  1326.25)  into 
three  pieces  using  a  table  saw  with  a  diamond-edged 
blade  according  to  the  methods  described  in  Section 
2.3  (Fig.  3).  After  eutting,  the  three  pieces  were  pro¬ 
cessed  separately.  The  first,  called  “untreated,”  was 
left  intact  without  further  preparation.  The  second, 
called  “rinsed,”  was  sonicated  for  30  min  each  in 
DCM -extracted  DI  water  and  DCM  prior  to  further 
processing.  This  DCM  was  also  retained  and  ana¬ 
lyzed  (called  “rinse  solvent”).  The  outer  surfaces 
(3  to  5  mm)  of  the  third  piece  were  cut  off  and  sep¬ 
arated  into  the  flat  sides,  called  “outside  flat,”  and 
the  curved  surfaces,  called  “outside  curve,”  The 
internal  portion  (diameter  ~9.5  cm)  was  called 
“inside.”  The  samples  from  these  five  treatments 
were  then  processed  and  analyzed  in  parallel 
according  the  methods  described  in  Section  2. 

Chromatograms  depicting  the  saturated  hydro¬ 
carbons  from  the  bitumens  of  these  six  samples 
are  presented  in  Fig.  4.  Although  all  the  fractions 
are  dominated  by  «-alkanes,  their  distributions 
and  abundances  are  markedly  different  between 
the  treatments.  The  hydrocarbons  from  the  inside 
piece  are  low  in  abundance  and  have  a  regular,  con¬ 
densate-like  carbon  number  distribution  that  peaks 
at  about  Cig.  In  contrast,  all  the  outside  surfaces, 
the  rinsed  piece  and  the  untreated  piece  have 
higher  abundances  of  hydrocarbons  that  show 
strong  even/odd  carbon  number  preferences.  Such 
preferences  should  not  persist  in  high  maturity 
rocks  (Peters  et  al.,  2005).  These  bitumens  also  have 
abundant  «-alkanes  greater  than  C22  and  some  dis¬ 
play  bimodal  distribution  patterns.  These  features 
are  especially  pronounced  in  the  rinse  solvent. 


10  cm 

Fig.  3.  Diagram  of  quarter  core  sample  (GKF  1326.25)  used  in 
cutting  experiment  (described  in  Section  4.1).  Bold  and  dashed 
lines  represent  cuts  made  with  table  saw. 

Additionally,  there  are  other  compounds,  including 
branched  alkanes,  in  these  pieces  that  are  not  pres¬ 
ent  in  the  inside  piece.  The  occurrence  and  abun¬ 
dance  of  these  compounds  differs  between  the 
treatments;  the  outside,  rinsed  and  untreated  pieces 
are  extremely  heterogeneous.  In  contrast,  the  com¬ 
pounds  found  in  the  Agouron  and  Deep  Time  Dril¬ 
ling  Project  inner  cores  are  largely  homogeneous 
and  similar  to  those  found  in  the  inside  piece  of 
GKF  1326.25. 

Cyclic  terpenoid  biomarker  analyses  of  the  bitu¬ 
mens  from  GKF  1326.25  show  parallel  patterns. 
Several  series  of  hopanes  and  steranes  are  present 
in  all  of  the  samples.  These  include  17a(II), 
21p(H)-hopanes,  2ot-  and  3 P-methly hopanes,  5q((H), 
14a(H),17a(H)  and  5ot(H),14P(H),17P(H)  regular 
steranes  (both  20S  and  207^)  and  rearranged 
13P(H),17a(H)-diasteranes.  Although  similar  com¬ 
pounds  are  found  in  each  of  the  treatments,  the  rel¬ 
ative  abundances  of  rearranged  molecules  are 
distinct.  In  general,  the  bitumen  from  the  inside 
piece  contains  the  highest  abundance  of  thermo¬ 
dynamically-favored  isomers.  For  example,  C27 
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GKF  1326.25 
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Fig.  4.  Saturated  hydrocarbon  traces  {m/z  85)  of  five  treatments  of  GKF  1326.25  and  the  solvent  used  to  clean  the  “rinsed''  piece.  All 
chromatograms  are  shown  to  the  same  absolute  scaling.  Relative  retention  times  are  shown  in  min.  Internal  standard  (1  ug  3- 
methylheneicosane)  indicated  by  Regular  w-alkane  peaks  indicated  by 


18oi-trisnorhopane  II  (Ts)  is  more  stable  than  C27 
17oi-trisnorhopane  (Tm).  In  samples  with  a  common 
organic  matter  source,  the  ratio  of  Ts  to  Tm 
increases  with  maturity.  As  shown  in  Table  2,  the 
value  of  Ts/Tm  is  highest  in  the  inside  piece  and  sig¬ 
nificantly  lower  in  the  other  treatments.  The  rinsed 
piece  has  the  second  highest  value.  The  ratio  of  dias- 
teranes  to  regular  steranes  also  increases  with  ther¬ 
mal  maturity  (e.g.  Peters  et  ah,  2005).  As  shown  in 
Table  2,  the  ratio  of  diasteranes  to  regular  steranes 
(C27-C29)  is  highest  in  the  inside  piece  and  second 
highest  in  the  rinsed  piece. 

Our  data  reveal  that  the  compounds  found  in  the 
outside,  untreated  and  rinsed  pieces  are  not  entirely 
syngenetic  to  the  Archean  sediment  and  are  not 
from  a  single  identifiable  source.  The  hydrocarbons 
from  bitumens  associated  with  outer  portions  of  the 
cores  are  highly  heterogeneous  and  almost  certainly 
include  material  introduced  during  drilling  and/or 
storage.  Only  the  signature  from  the  inside  piece 
likely  represents  original,  syngenetic,  highly  mature 
Archean  hydrocarbons.  Although  sonicating  core 
pieces  in  DCM  appears  to  move  some  of  the  exoge¬ 
nous  compounds  from  the  core  to  the  rinse  solvent, 
only  cutting  off  the  outer  surfaces  of  the  core  ade¬ 


quately  eliminates  the  foreign  contaminants.  There¬ 
fore,  we  believe  that  it  is  necessary  to  cleanly  cut  off 
the  outer  3  to  5  mm  of  core  material  prior  to 
analysis. 

4.1.1.  Sample  preparation:  crushing 

After  determining  that  it  was  necessary  to 
remove  the  outer  core  surfaces,  we  conducted  an 
experiment  to  see  if  the  bitumen  could  be  extracted 
from  the  inner  core  solely  by  sonication  in  organic 
solvent  without  further  treatment.  We  sonicated 
an  inner  core  piece  twice  in  DCM  for  30  min  and 
processed  the  bitumen  as  described  in  Section  2. 
We  were  unable  to  detect  biomarkers  (data  not 
shown).  This  result  was  expected  because  in  high 
maturity  sediments,  bitumen  is  held  tightly  in  a 
matrix  of  recalcitrant  kerogen  and  mineral  material. 
It  is  therefore  largely  inaccessible  to  organic  sol¬ 
vents  in  whole  rock  form  and  only  becomes  extract- 
able  after  the  samples  are  crushed  to  a  fine  powder. 

Rock  crushing  equipment  typically  used  to  pre¬ 
pare  samples  for  organic  geochemical  analysis 
proved  unsuitable  for  the  extremely  low  blank 
requirements  of  Precambrian  biomarker  analysis. 
For  example,  samples  crushed  in  an  alumnia  cera- 
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Table  2 

Cutting  experiment  biomarker  ratios 


1995 


Biomarker  ratio 

Inside 

Rinsed 

Outside  flat 

Outside  curve 

Untreated 

Ts/Tm* 

2.15 

1.70 

1.44 

1.34 

0.88 

^C27-29  dia/regular  steranes*’ 

1.30 

0.84 

0.66 

0.37 

0.54 

^  “Ts/Tm”  is  the  ratio  of  C27  18a-trisnorhopane  II  (Ts)  to  C27  17a-trisnorhopane  (Tm);  2a  standard  deviations  =  0.08. 

^  “Y)  C27-C29  dia/regular  steranes”  is  the  ratio  of  the  sum  of  C27  to  C29  diasteranes  to  the  sum  of  the  C27  to  C29  regular  steranes;  2cr 
standard  deviations  =  0.04. 


mic  puck  mill  (SPEX  8505)  were  found  to  contain 
similar  quantities  of  biomarkers  to  sand  blanks  pro¬ 
cessed  alongside  them.  Because  of  its  size  and  con¬ 
struction,  this  type  of  mill  cannot  readily  be 
cleaned  by  submersion  and  sonication  in  organic 
solvents.  This  may  result  in  less  effective  cleaning 
between  samples  and  so  can  lead  to  cross-contami¬ 
nation  due  to  sample  carry  over. 

To  avoid  this  problem,  we  developed  a  set  of 
solid  stainless  steel  tools  that  can  be  cleaned  by  son¬ 
ication  in  solvent.  The  tools  also  maximize  efficiency 
and  minimize  physical  stress  on  the  investigator.  We 
designed  a  stainless  steel  mortar  and  pestle  (Fig.  la) 
that  can  be  clamped  down  to  an  aluminum  foil-cov¬ 
ered  board.  Samples  can  be  easily  crushed  to  small 
pieces  (ca.  1/4”  diameter)  without  exposure  to 
potentially  contaminating  surfaces  and  without 
sample  loss.  We  then  use  a  small  stainless  steel  puck 
mill  (Fig.  lb,  9  cm  diameter  x  5.4  cm  height)  to 
crush  the  samples  to  a  fine  powder. 

4.2.  Extraction  and  fractionation 

We  tested  two  methods  for  solvent  extraction  of 
crushed  rock  powder.  Initially,  a  Dionex  accelerated 
solvent  extractor  (ASE)  was  used.  This  instrument 
has  the  advantage  of  performing  automated  extrac¬ 
tion  under  elevated  temperature  and  pressure, 
potentially  increasing  solvent  access  to  the  rock 
matrix.  Flowever,  we  found  that  an  ASE  introduces 
low,  but  unpredictable,  levels  of  contamination.  For 
example,  ASE-extracted  sand  blanks  can  contain 
high  abundances  of  biomarkers  of  similar  composi¬ 
tion  to  those  found  in  samples  processed  alongside 
them.  This  may  be  because  there  is  plastic  tubing 
in  an  ASE  that  is  not  optimally  suited  to  trace 
hydrocarbon  extraction.  Additionally,  carry  over 
may  occasionally  occur  between  samples  if  small 
amounts  of  residual  extract  are  not  fully  flushed 
through  the  machine. 

In  an  effort  to  bypass  this  unpredictable  source  of 
contamination,  we  conducted  an  experiment  to  see 


if  bitumen  could  effectively  be  liberated  from  high 
maturity  powder  by  sonication.  We  divided  the  out¬ 
side  cuttings  of  one  sample  (“GKP  332.5  outside”) 
and  a  sand  blank  into  two  equal  parts.  One  half 
of  each  was  extracted  using  an  ASE;  the  other  was 
extracted  using  two  rounds  of  sonication  in  DCM. 
For  this  experiment,  and  in  general,  we  chose  to 
use  two  rounds  of  sonication  because  we  found  that 
subsequent  extraction  yield  undetectable  quantities 
of  hydrocarbons.  For  both  the  sample  and  the 
blank,  we  found  that  extraction  using  the  ASE 
produced  approximately  a  twofold  increase  in  bio¬ 
marker  yield  (Table  3).  Given  the  cost  of  unpredict¬ 
able  contamination  introduced  by  use  of  an  ASE, 
the  decrease  in  yield  with  sonication  extraction  is 
relatively  acceptable.  When  trace  contaminants 
and  sample  carry  over  are  critical  concerns,  high 
maturity  bitumens  can  be  effectively  extracted  using 
sonication  methods  as  described  in  Section  2.4. 

4.3.  Preparation  of  bitumen  II 

As  discussed  above,  studies  of  high  maturity 
organic  matter  must  take  into  account  the  possibil¬ 
ity  not  only  of  laboratory  contamination  but  also 
of  migration  of  geologically  younger  hydrocarbons 
into  the  sediment.  These  mobile  contaminants  are 
most  likely  to  be  extractable  from  the  whole  rock 
powder  as  a  primary  extract  (termed  “bitumen 
I”).  To  assess  the  syngeneity  of  our  primary 
extracts,  we  prepared  a  set  of  demineralized 
powders  and  re-extracted  them  to  obtain  a  second 
bitumen,  termed  “bitumen  II.”  Because  the  hydro¬ 
carbons  in  bitumen  II  are  not  originally  solvent 
extractable  and  are  considered  more  closely  associ¬ 
ated  with  the  kerogen  and  mineral  matrix  of  the 
rocks,  they  are  less  likely  to  be  contaminated  by 
external,  migrated  bitumen.  If  bitumen  I  and  bitu¬ 
men  II  extracts  from  a  given  rock  sample  have  sim¬ 
ilar  compositions,  it  would  strongly  suggest  that 
they  derive  from  the  same  pool  of  syngenetic 
organic  matter. 
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Table  3 


Extraction  experiment  biomarker  yields 


Sonication  extraction^  GKF  332.5 
outside 

ASE”  GKF  332.5 
outside 

Sonication  extraction  sand 
blank 

ASE  sand 

blank 

Extractable  hopanes 

(ng) 

2.39 

5.49 

0.50 

1.83 

Extractable  steranes 
(ng) 

4.81 

9.08 

1.05 

2.61 

^  Yield  for  samples  extracted  by  two  rounds  of  sonication  for  30  min  in  DCM. 
*’  Yield  for  samples  extracted  using  an  accelerated  solvent  extractor. 


In  this  study  we  demineralized  and  re-extracted 
seven  samples.  Hydrocarbon  abundance,  biomarker 
carbon  number  distributions,  degree  of  isomeriza¬ 
tion  and  indicated  thermal  maturity  are  similar 
between  the  two  sets  of  bitumens.  A  set  of  bio¬ 
marker  chromatograms  from  a  representative  sam¬ 
ple  (GKF  314.65)  is  shown  in  Fig.  5.  The  same 
compounds,  in  approximately  the  same  relative 
abundances,  are  found  in  both  bitumens.  For  exam¬ 
ple,  Ts/Tm  and  the  degree  of  diasterane  isomeriza¬ 


tion  (“dia  S/R”)  are  nearly  identical  between 
bitumen  I  and  bitumen  II  for  this  sample  (Fig.  5). 

Despite  these  similarities,  there  are  systematic 
differences  between  bitumens  I  and  II.  These  differ¬ 
ences  may  be  explained  by  a  closer  association  of 
the  latter  with  the  rock  mineral  matrix.  For  exam¬ 
ple,  the  methyl  rearrangement  reactions  that  lead 
to  diasteranes  are  thought  to  be  catalyzed  by  clay 
minerals  (Rubenstein  et  al.,  1975;  Sieskind  et  al., 
1979).  If  bitumen  II  is  more  closely  associated  with 
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Fig.  5.  Typical  biomarker  results  (from  sample  GKF  314.65)  from  MRM-GC-MS-MS  analysis  of  bitumen  I  (a)  and  bitumen  II  (b). 
Relative  retention  times  are  shown  in  min.  C27  to  C29  steranes  and  C27  hopane  traces  are  shown;  MRM  transitions  and  relevant  ratios  for 
bitumen  I  (“BI”)  and  bitumen  II  (“BII")  are  indicated  between  corresponding  traces.  Sand  blank  chromatograms  are  shown  below  each 
sample  trace  to  the  same  absolute  scaling.  Compound  abbreviations  are  as  in  Fig.  2;  “dia  S/R”  is  the  ratio  of  205  to  20R  epimers  of 
13p{H),17ot(H}  diasterane;  “dia/reg”  is  the  ratio  of  13p  (H),  17o((H)  diasteranes  (205  and  20R  epimers)  to  5c((H),  14o((H),  17a(H)  and 
5o((H),  14P(H),  17P(H)  regular  steranes  (205  and  20R  epimers);  “Ts/Tm”  is  the  ratio  of  C27  18a-trisnorhopane  II  to  C27  17ot-trisnorhopane 
(Tm);  “Pa/ap”  is  the  ratio  of  17P(H),  21g((H)  hopane  (moretane)  to  17g((H),  21P(H)  hopane. 
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the  mineral  matrix  and  clays,  it  would  be  likely  to 
contain  greater  abundances  of  diasteranes  than 
bitumen  I.  As  shown  in  Fig.  5,  diasterane/regular 
sterane  ratio  values  are  higher  for  bitumen  II  than 
bitumen  I,  a  pattern  seen  for  most  samples  in  this 
study.  It  is  especially  important  when  extracting 
trace  amounts  of  hydrocarbons  to  conduct  such 
assessments  of  syngeneity  (see  Section  5). 

4.4.  GC-MS  and  MRM-GC-MS-MS 

Analysis  of  high  maturity  sediments  with  trace 
quantities  of  biomarkers  is  facilitated  by  recent 
improvements  in  instrumentation.  Highly  sensitive 
MS  methods  such  as  MRM-GC-MS-MS  can  reli¬ 
ably  detect  sub-ppb  quantities  of  hydrocarbons. 
Certain  molecules,  including  cyclic  terpenoid  bio¬ 
markers,  fragment  to  known  product  ions  in  a  mass 
spectrometer  via  predictable  pathways  determined 
by  their  chemical  structures.  Appropriate  precur¬ 
sor-product  m/z  values  are  summarized  in  Table  1. 
The  selectivity  of  MRM  techniques  enhances  signal 
to  noise  ratio,  allowing  reliable  detection  of  bio¬ 
markers  at  much  lower  abundances  than  is  possible 
using  other  approaches  such  as  GC-MS  in  full  scan 
or  selected  ion  monitoring  modes. 

5.  Discussion:  data  analysis  and  assessment  of 
syngeneity 

The  methods  described  in  Section  2  have  greatly 
improved  our  confidence  in  our  analysis  of  high 
maturity  Precambrian  organic  matter.  These  meth¬ 
ods  produce  low  levels  of  background  contamina¬ 
tion  and  relatively  high  sample  biomarker  yield. 
However,  it  is  still  important  to  scrutinize  each  anal¬ 
ysis  for  evidence  of  contamination  and  to  conduct 
more  general  tests  for  syngeneity  over  the  entire 
data  set.  The  detected  compounds  must  be  demon¬ 
strably  Precambrian  in  origin  and  must  be  derived 
from  their  host  rocks  to  be  meaningfully  interpreted 
as  molecular  fossils. 

First  order  evidence  for  contamination  of  Pre¬ 
cambrian  rock  extracts  would  include  the  presence 
of  biomarkers  for  later-evolving  organisms,  particu¬ 
larly  polyterpenoids  of  higher  plants;  none  of  these 
were  detected.  Indigenous  organic  matter  with  a  sin¬ 
gle  source  should  also  show  consistent  patterns  of 
molecular  rearrangement  and  isomerization 
between  structural  homologues  (e.g.,  members  of  a 
carbon  number  series).  The  full  range  of  saturated 
and  aromatic  hydrocarbons  and  biomarker  com¬ 


pounds  should  be  considered  for  each  individual 
analysis  because  an  aberrant  signal  might  be 
observed  in  only  one  or  a  few  compound  classes. 
For  the  purposes  of  our  study,  which  included 
assessing  microbial  diversity  from  some  of  the  earli¬ 
est  well-preserved  sedimentary  sequences  in  the  geo¬ 
logic  record,  samples  that  bore  any  anomalous 
signals  were  excluded.  Thus,  although  70  samples 
from  the  Agouron  cores  were  processed,  only  33 
were  included  in  the  final  database  (Waldbauer 
et  ah,  2007).  Samples  were  excluded  for  a  variety 
of  reasons.  Some  have  bimodally  distributed  n- 
alkane  distributions,  others  contain  biomarker 
abundances  similar  to  a  parallel  blank,  and  others 
display  inconsistent  or  characteristically  immature 
ratios  of  rearranged  biomarker  compounds. 

For  example,  the  sample  depicted  in  Fig.  6  (GKP 
741.54)  was  excluded  because  its  biomarker  and  n- 
alkane  chromatograms  display  characteristics  of 
immature  organic  matter.  Unlike  in  syngenetic 
Archean  bitumens,  several  thermodynamically 
unfavored  isomers,  including  C27  17a-trisnorhopane 
(Tm)  and  5Q((H),14a(H),17ot(H)  regular  steranes 
(205  and  20R)  are  present  at  greater  abundance 
than  more  thermodynamically  favored  rearranged 
isomers  (Fig.  6a).  Additionally,  the  degree  of  a/p 
isomerization  and  SIR  epimerization  is  not  consis¬ 
tent  between  sterane  carbon  numbers  (Fig.  6a).  Ster¬ 
anes  from  a  single,  syngenetic  organic  matter  source 
that  has  been  exposed  to  a  set  of  temperature  and 
pressure  conditions  should  all  be  rearranged  in  the 
same  manner.  Finally,  as  shown  in  Fig.  6b,  the  sat¬ 
urated  hydrocarbon  fraction  from  this  sample  is  not 
dominated  by  low  molecular  weight  w-alkanes. 
Instead,  it  contains  branched  and  unsaturated  com¬ 
pounds  at  greater  abundances  than  the  internal 
standard  (i.e.,  greater  than  1  pg).  As  described  in 
Section  3,  these  characteristics  are  not  typical  of 
Archean  bitumen  and  suggest  that  this  sample  is 
contaminated. 

In  addition  to  this  careful  scrutiny  of  individual 
sample  analyses,  further  intra-  and  inter-sample 
comparisons  should  be  undertaken  to  demonstrate 
syngeneity.  It  is  important  not  only  that  individual 
samples  be  free  of  contamination,  but  also  that 
the  bitumens  are  syngenetic  to  their  host  material. 
Comparisons  between  bitumens  I  and  II,  between 
stratigraphically  correlated  samples,  and  across  sig¬ 
nificant  stratigraphic  boundaries,  can  assist  with 
determination  of  syngeneity. 

Samples  may  be  demineralized  and  re-extracted 
(see  Sections  2.5  and  4.3)  to  yield  bitumen  II 
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a  GKF741.54  b  GKF  741.54 


Fig.  6.  Example  of  a  potentially  contaminated  sample  (GKF  741.54).  Compound  abbreviations  and  symbols  are  as  in  Figs.  2  and  5;  “reg 
aPP  S/R”  is  the  ratio  of  205  to  20R  epimers  of  5a(FI),14p(FI),17p(F[)  sterane.  Relative  retention  times  are  shown  in  min.  (a)  Biomarker 
chromatograms  of  C27  to  C29  steranes  and  C27  hopanes;  sand  blank  traces  are  shown  below  each  chromatogram  to  the  same  absolute 
scaling,  (b)  Chromatogram  of  saturated  hydrocarbon  fraction  (m/z  85). 


fractions.  Because  bitumen  II  represents  a  pool  of 
hydrocarbons  that  was  not  solvent-accessible  in 
the  crushed  rock  powder,  it  is  less  likely  to  be 
affected  by  mobile  contaminants  than  bitumen  I. 
Therefore,  for  a  given  sample,  if  bitumen  I  is  sim¬ 
ilar  in  composition  to  bitumen  II  (with  predictable 
differences  as  described  in  Section  4.3),  it  is  likely 
to  be  derived  from  the  same  pool  of  syndeposi- 
tional  organic  matter  instead  of  from  a  separate 
source. 

Comparisons  between  stratigraphically  corre¬ 
lated  samples  from  geographically  distinct  cores 
can  also  strongly  support  claims  of  syngeneity.  If 
feasible,  cores  drilled  at  close  proximity  can  be  used 
to  assess  the  lateral  variability  along  the  strike  of  a 
given  unit.  Such  cores  also  allow  for  comparison  of 
directly  correlated  samples,  potentially  strengthen¬ 
ing  claims  of  syngeneity.  Given  the  resources  neces¬ 
sary  to  obtain  drill  core  material,  it  may  be  difficult 
to  justify  ‘re-sampling’  the  same  strata;  nevertheless, 
we  have  found  inter-core  comparisons  to  be  valu¬ 
able  tests  of  syngeneity.  For  example,  in  the  case 
of  the  Agouron  Griqualand  Drilling  Project,  cores 
GKF  and  GKP  were  drilled  24  km  apart  through 
different  but  correlatable  portions  of  a  depositional 
system.  This  allows  for  the  comparison  of  chemo- 
stratigraphic  patterns  and  variations  between  corre¬ 


lated  samples  through  a  variety  of  depositional 
systems.  By  making  such  comparisons  between 
GKP  and  GKF,  we  observed  strong  correlations 
in  biomarker  content,  especially  in  the  deepwater 
facies  (Waldbauer  et  al.,  2007).  Inter-core  correla¬ 
tions  are  strong  evidence  both  for  depositional  con¬ 
trol  of  organic  matter  content  and  the  syngeneity  of 
molecular  fossils. 

Comparison  of  Archean  samples  to  younger 
units  across  an  unconformity  may  also  support 
claims  of  syngeneity.  In  cases  where  Phanerozoic 
sediments  unconformably  overlie  Precambrian 
material,  a  strong  geochemical  contrast  should 
exist.  This  contrast  would  be  overprinted  and 
diminished  if  contaminants  were  introduced  during 
drilling,  storage,  or  handling.  The  Agouron  Drilling 
project  recovered  approximately  200  m  of  a  Permo- 
Carboniferous  diamictite  that  was  deposited  directly 
on  top  of  the  youngest  Precambrian  unit,  creating  a 
2  billion  year  unconformity  (Visser,  1989).  Bitumens 
extracted  from  this  diamictite  are  in  fact  geochemi- 
cally  distinct  from  the  underlying  Archean  material. 
The  younger  material  is  much  less  thermally  mature 
and  substantially  more  heterogeneous.  This  clear 
contrast  provides  additional  evidence  that  bitumens 
extracted  from  Archean  units  are  in  fact  syngenetic 
and  were  not  contaminated  either  by  hydrocarbons 
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from  younger  units  or  from  the  external 
environment. 

All  of  the  characteristics  of  each  sample  and  its 
corresponding  blank  should  be  examined  in  tandem 
with  complete  data  set  analyses  of  syngeneity  as 
described  above.  Given  the  low  biomarker  yields 
from  Precambrian  sediments  and  the  ubiquitous 
environmental  presence  of  petroleum-derived 
hydrocarbons,  sample  contamination  can  easily 
occur.  Throughout  data  analysis,  it  remains  up  to 
the  investigator  to  determine  the  best  and  most  con¬ 
sistent  course  of  action  -  a  single  set  of  criteria  is 
unlikely  to  be  applicable  to  all  situations. 

6.  Conclusions 

Key  aspects  of  methods  described  in  this  paper 
would  not  have  been  feasible  during  the  1960s  when 
the  organic  geochemistry  of  Precambrian  rocks  was 
first  studied.  Not  only  did  researchers  not  initially 
recognize  the  prevalence  of  potential  contaminants, 
but  also  the  materials  and  instrumentation  neces¬ 
sary  to  successfully  process  high  maturity  samples 
were  not  available.  Materials  such  as  high  purity 
organic-free  solvents  were  not  commercially  avail¬ 
able  and  investigators  had  to  distill  their  own  (Hoer¬ 
ing,  1967).  Pure,  isotopically  labeled  internal 
standards  were  similarly  not  available.  Additionally, 
instrumentation  such  as  highly  sensitive  GC-MS- 
MS  was  not  refined  until  the  1980s. 

The  procedures  presented  take  advantage  of 
these  advances  in  technology.  While  we  believe  that 
the  methods  are  well  suited  for  the  analysis  of  trace 
amounts  of  hydrocarbons,  as  described  above,  each 
data  set  should  be  carefully  examined  for  any  evi¬ 
dence  of  contamination  or  sample  carry  over. 
Application  of  the  techniques  allows  researchers  to 
do  now  that  which  was  impossible  in  the  mid-20th 
century,  that  is,  study  demonstrably  Precambrian 
high  maturity  organic  matter  to  learn  about  micro¬ 
bial  evolution  and  early  life  on  Earth. 
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Proteorhodopsins  (PRs)  are  retinal-containing  proteins  that  cata¬ 
lyze  light-activated  proton  efflux  across  the  cell  membrane.  These 
photoproteins  are  known  to  be  globally  distributed  in  the  ocean's 
photic  zone,  and  they  are  found  in  a  diverse  array  of  Bacteria  and 
Archaea.  Recently,  light-enhanced  growth  rates  and  yields  have 
been  reported  in  at  least  one  PR-containing  marine  bacterium,  but 
the  physiological  basis  of  light-activated  growth  stimulation  has 
not  yet  been  determined.  To  describe  more  fully  PR  photosystem 
genetics  and  biochemistry,  we  functionally  surveyed  a  marine 
picoplankton  large-insert  genomic  library  for  recombinant  clones 
expressing  PR  photosystems  in  vivo.  Our  screening  approach  ex¬ 
ploited  transient  increases  in  vector  copy  number  that  significantly 
enhanced  the  sensitivity  of  phenotypic  detection.  Two  genetically 
distinct  recombinants,  initially  identified  by  their  orange  pigmen¬ 
tation,  expressed  a  small  cluster  of  genes  encoding  a  complete 
PR-based  photosystem.  Genetic  and  biochemical  analyses  of  trans- 
poson  mutants  verified  the  function  of  gene  products  in  the 
photopigment  and  opsin  biosynthetic  pathways.  Heterologous 
expression  of  six  genes,  five  encoding  photopigment  biosynthetic 
proteins  and  one  encoding  a  PR,  generated  a  fully  functional  PR 
photosystem  that  enabled  photophosphorylation  in  recombinant 
Escherichia  coli  cells  exposed  to  light.  Our  results  demonstrate  that 
a  single  genetic  event  can  result  in  the  acquisition  of  phototrophic 
capabilities  in  an  otherwise  chemoorganotrophic  microorganism, 
and  they  explain  in  part  the  ubiquity  of  PR  photosystems  among 
diverse  microbial  taxa. 

photoheterotrophy  |  rhodopsin  |  lateral  gene  transfer  |  marine  | 
metagenomics 

Proteorhodopsins  (PRs)  are  retinal-binding  membrane  pro¬ 
teins  belonging  to  the  rhodopsin  family.  Prokaryotic  mem¬ 
bers  of  this  family  include  photosensors  (sensory  rhodopsins), 
transmembrane  proton  pumps  (bacteriorhodopsins,  xanthoro- 
hodopsin,  and  PRs),  and  transmembrane  chloride  pumps  (halor- 
hodopsins).  Originally  discovered  in  Archaea,  rhodopsins  were 
later  identified  in  Gammaproteobacteria  of  the  SAR86  group 
during  a  cultivation-independent  genomic  survey.  Dubbed  pro¬ 
teorhodopsin,  this  photoprotein  functions  as  a  light-activated 
proton  pump  when  expressed  in  Escherichia  coli  in  the  presence 
of  exogenously  added  retinal  (1).  Since  then,  numerous  molec¬ 
ular  surveys  have  demonstrated  that  PR  genes  are  ubiquitous  in 
bacteria  inhabiting  the  ocean’s  photic  zone  (2-9).  An  estimated 
13%  of  bacteria  in  marine  picoplankton  populations,  as  well  as 
a  significant  fraction  of  planktonic  Euryarchaeota,  contain  a  PR 
gene  (4,  8).  In  a  number  of  marine  bacteria,  retinal  biosynthetic 
genes  and  PR  are  genetically  linked,  and  their  lateral  transfer 
and  retention  appear  to  be  relatively  common  events,  indicating 
that  the  photosystem  confers  a  significant  fitness  advantage  (3, 
4,  7,  10,  11).  A  recent  report  of  light-stimulated  growth  in  a 
PR-containing  marine  flavobacterium  supports  this  hypothesis 
(11).  Despite  all  of  these  observations  however,  the  various 
specific  functions  and  physiological  roles  of  diverse  marine 
microbial  PRs  remain  to  be  fully  described. 


To  further  characterize  PR  photosystem  structure  and  func¬ 
tion,  we  directly  screened  large-insert  DNA  libraries  derived 
from  marine  picoplankton  for  visibly  detectable  PR-expressing 
phenotypes.  In  this  report,  we  describe  completely  intact  PR- 
based  photosystems  that  can  be  functionally  expressed  in  E.  coli, 
without  addition  of  exogenous  photopigment  (e.g.,  retinal  or  its 
precursors).  Analyses  of  insertional  mutants  verified  the  func¬ 
tional  annotation  of  each  gene  product  in  the  photosystem 
biosynthetic  pathway.  We  also  show  that  light-activated,  PR- 
catalyzed  proton  translocation,  by  the  chemiosmotic  potential  it 
generates,  activates  photophosphorylation  in  E.  coli. 

Results 

Screening  a  Fosmid  Library  for  in  Vivo  PR  Photosystem  Expression. 

When  E.  coli  expresses  a  PR  apoprotein  from  an  inducible 
promoter  on  a  high-copy  number  plasmid,  the  cells  acquire  a  red 
or  orange  pigmentation  in  the  presence  of  exogenous  all-trans 
retinal  (1,  12).  Retinal  addition  is  required  because  E.  coli  lacks 
the  ability  to  biosynthesize  retinal  or  its  precursor,  j3-carotene. 
Based  on  these  observations,  we  screened  for  PR-containing 
clones  on  retinal-containing  LB  agar  plating  medium,  which  we 
expected  would  display  an  orange  to  red  phenotype  under  these 
conditions.  To  enhance  assay  sensitivity,  we  used  the  copy- 
control  system  present  in  our  fosmid  vector  that  allowed  a 
controlled  transition  from  one  copy  per  cell  to  multiple  (up  to 
100)  vector  copies  upon  addition  of  the  inducer  L-arabinose  (13). 

A  fosmid  library  prepared  from  ocean  surface  water  pico¬ 
plankton  containing  12,280  clones  (=»440  Mb  of  cloned  DNA) 
(14)  was  screened  by  using  the  above  approach.  Three  orange 
colonies  were  identified  as  potential  PR-expressing  clones  on  the 
LB-retinal-L-arabinose  agar  plates.  All  three  showed  no  pigmen¬ 
tation  in  the  absence  of  the  high-copy  number  inducer.  Unex¬ 
pectedly,  these  clones  also  displayed  an  orange  phenotype  in  the 
absence  of  L-retinal  when  induced  to  high  copy  number.  The 
sequence  of  one  clone,  HF10_19P19,  revealed  the  presence  of  a 
PR  gene  near  the  fosmid  vector  junction  (see  below).  Because 
the  clones  exhibited  orange  pigmentation  in  the  absence  of 
exogenous  retinal,  we  expected  that  they  must  also  be  expressing 
retinal  biosynthetic  genes.  Two  clones,  HF10_25F10  and 
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Fig.  1.  Genetic  and  phenotypic  analysis  of  PR  photosystem  transposon  mutants.  (A)  Schematic  representation  of  the  PR  gene  clusters  identified  in  this  work. 
Predicted  transcription  terminators  in  the  clusters  are  indicated.  (6)  Color  phenotype  of  intactcellsof  transposon-insertion  mutants  grown  in  liquid  cultures  with 
arabinose.  (0  Retinal  biosynthesis  pathway.  Names  of  genes  encoding  pathway  enzymes  are  indicated.  The  genes  that  are  present  in  E.  coli  are  in  parentheses. 
(D)  HPLC  profiles  of  wild-type  and  transposon  mutant  extracts.  Detection  wavelengths  are  indicated.  Absorption  spectrum  of  relevant  peaks,  including  standards 
used  for  identification,  are  shown  on  the  top  for  each  panel. 


HF10_19P19,  were  analyzed  further  for  PR  photosystem  gene 
expression  and  function. 

Genomic  Analyses  of  Candidate  PR  Photosystem-Expressing  Clones. 

The  full  DNA  sequence  of  the  two  putative  PR  photosystem- 
containing  fosmids  was  obtained  by  sequencing  a  collection  of 
transposon-insertion  clones.  The  approach  facilitated  rapid 
DNA  sequencing  while  simultaneously  providing  a  set  of  pre¬ 
cisely  located  insertion  mutants  for  phenotypic  analysis  of  spe¬ 
cific  gene  functions  (15). 

Both  PR  photosystem-containing  clones  were  derived  from 
Alphaproteobacteria  based  on  ORF  content  similarity  to  homo- 
logues  in  the  National  Center  for  Biotechnology  Information 
nonredundant  protein  database  [supporting  information  (SI) 
Tables  1  and  2].  The  clones  exhibited  highest  identity  to  other 


PR-containing  BAG  clones  from  Alphaproteobacteria  from  the 
Mediterranean  and  Red  Seas  (8).  This  similarity  was  evident 
across  the  entire  cloned  insert,  although  some  large-scale  rear¬ 
rangements  were  apparent.  The  HF10_19P19  PR-inferred  pro¬ 
tein  sequence  was  most  similar  to  a  homologue  from  another 
environmental  BAC,  MedeBAC66A03  (67%  identity,  83%  sim¬ 
ilarity).  The  MedeBAC66A03  PR  was  previously  reported  to 
exhibit  fast  photocycle  kinetics  and  light-activated  proton  trans¬ 
location  when  expressed  in  E.  coli  in  the  presence  of  exogenous 
retinal  (8).  Clone  HF10_25F10  PR  was  most  similar  in  inferred 
protein  sequence  to  another  BAC  clone,  RED17H08  PR  (93% 
identity,  97%  similarity)  and  was  very  similar  to 
MedeBAC66A03  as  well  (62%  identity,  78%  similarity).  Both  of 
the  PR  genes  analyzed  here  encoded  proteins  with  a  glutamine 
residue  at  position  105,  a  characteristic  of  blue  light-absorbing 
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Fig.  2.  Proton-pumping  assays.  pH  measurements  are  expressed  as  pH  change  with  respect  to  the  pH  at  time  0  for  each  sample.  Gray  boxes  indicate  dark  periods. 


PRs  (5)  and  consistent  with  the  orange  pigmentation  observed 
in  clones  expressing  them. 

Adjacent  to  the  PR  gene  in  both  clones  was  a  predicted  six-gene 
operon  encoding  putative  enzymes  involved  in  /3-carotene  and 
retinal  biosynthesis  (Fig.  L4).  A  similar  arrangement  was  reported 
in  MedeBAC66A03  and  RED17H08  (8)  and  more  recently  in  a 
wide  variety  of  diverse  marine  bacterial  groups  (7).  The  genes 
encoded  on  these  operons  include  crtE  [putative  geranylgeranyl 
pyrophosphate  (GGPP)  synthase],  crti  (phytoene  dehydrogenase), 
a-tB  (phytoene  synthase),  crtY  (lycopene  cyclase),  blh  (15,15'-/3- 
carotene  dioxygenase),  and  idi  [isopentenyl  diphosphate  (IPP) 
S-isomerase].  The  putative  role  of  these  proteins  in  the  retinal 
biosynthetic  pathway  (for  review,  see  ref.  16)  is  indicated  in  Fig.  1C. 
The  first  reactions  in  the  pathway  are  catalyzed  by  the  IPP  S-isomer- 
ase  and  farnesyl  diphosphate  (FPP)  synthase.  Both  enzymes  are 
part  of  the  isoprenoid  and  ubiquinone  metabolic  pathways  and  are 
present  in  E.  coli.  crtE,  crtB,  crtI,  and  crtY  appear  to  encode  all  of 
the  enzymes  necessary  to  synthesize  )3-carotene  from  FPP.  The  blh 
gene  found  in  MedeBAC66A03  was  previously  shown  to  encode  a 
15,15'-/3-carotene  dioxygenase  that  cleaves  /3-carotene,  producing 
two  molecules  of  all-trans-XQ.imd\  (8). 

Apart  from  the  PR  and  putative  /3-carotene  and  retinal 
biosynthesis  operon,  no  other  genes  were  shared  between  the 
two  PR-containing  fosmids.  With  the  exception  of  a  gene 
encoding  a  putative  deoxyribodipyrimidine  photolyase  in 
HF10_25F10,  no  other  genes  flanking  the  PR  photosystem  had 
an  obvious,  light-related  function  (SI  Tables  1  and  2). 

Genetic  and  Phenotypic  Analysis  of  the  PR  Photosystem.  To  obtain 
direct  evidence  for  the  functional  annotations  of  putative  retinal 
biosynthesis  genes,  we  analyzed  different  transposon  mutant  phe¬ 
notypes  that  carried  an  insertion  in  predicted  PR  photosystem 
genes.  The  cell  pigmentation  and  HPLC  pigment  analyses  in 
selected  mutants  are  shown  in  Fig.  1  B  and  D,  respectively. 
HF10_19P19  cells  carrying  the  intact  vector  were  orange  when 
grown  in  the  presence  of  arabinose,  consistent  with  expression  of  a 
blue-adapted  retinal-PR  complex.  HPLC  analysis  revealed  the 
presence  of  retinal  in  cell  extracts,  demonstrating  that  clone 
HF10_19P19  contained  all  genes  required  for  retinal  biosynthesis  in 
E.  coli.  Neither  lycopene  nor  /3-carotene  was  observed  in  the  intact 
clone  extracts,  indicating  that  there  was  little  if  any  accumulation  of 
pigment  intermediates  (Fig.  ID).  Cells  containing  transposon  in¬ 
sertions  in  the  idi  gene  were  also  orange  and  contained  retinal.  The 
lack  of  phenotype  in  this  mutant  can  be  attributed  to  the  presence 
of  the  endogenous  idi  gene  in  E.  coli  (17). 

As  expected,  transposon  insertion  mutants  disrupted  in  the 
PR  gene  itself  were  devoid  of  orange  pigmentation,  and  HPLC 
analysis  showed  a  low  but  detectable  level  of  retinal  in  these 
extracts  (data  not  shown).  It  is  unclear  at  present  whether  the  low 
levels  of  retinal  were  due  to  polar  effects  caused  by  the  trans¬ 
poson  insertion  in  downstream  expression  or  whether  they  result 
from  pathway  inhibition  due  to  product  accumulation. 


Transposon-insertion  mutants  in  crtE,  crtB,  and  crti  showed  no 
pigmentation,  as  expected  for  this  biosynthetic  pathway  if  it  is 
interrupted  before  lycopene  formation,  the  first  colored  product 
in  the  pathway.  crtY-insertion  mutants,  however,  were  pink, 
suggesting  that  they  were  accumulating  lycopene.  Pigment  anal¬ 
ysis  verified  that  crtY-insertion  mutant  extracts  contained  lyco¬ 
pene  but  not  retinal  or  ^-carotene  (Fig.  1  B  and  D).  Finally, 
h//i -insertion  mutants  had  a  yellow  phenotype,  and  HPLC  anal¬ 
ysis  showed  that  these  cells  lacked  detectable  retinal  but  instead 
accumulated  /3-carotene.  This  finding  demonstrates  that  the  blh 
gene  in  HF10_19P19  encodes  a  15,15'-/3-carotene  dioxygenase, 
similar  to  a  homologue  recently  described  by  Sabehi  et  al.  (8). 
Transposon  insertions  in  all  other  predicted  genes  outside  the 
PR  cluster  had  no  visibly  obvious  phenotype.  Identical  pigmen¬ 
tation  phenotypes  were  observed  with  insertions  in  the  corre¬ 
sponding  genes  of  the  other  PR  photosystem  clone,  HF10_25F10. 
Taken  together,  these  results  strongly  support  the  functional 
assignments  of  PR-associated  retinal  biosynthetic  pathway  genes 
and  demonstrate  that  they  are  necessary  and  sufficient  to  induce 
retinal  biosynthesis  in  E.  coli. 

Light-Activated  Proton  Translocation.  We  assayed  both  HF10_19P19 
and  HF10_25F10  grown  under  high-copy  number  conditions  for 
light-activated  proton-translocating  activity.  Light-dependent  de¬ 
creases  in  pH  were  observed  in  PR'*’  clones  but  not  in  mutants 
containing  a  transposon  insertion  in  the  PR  gene  (PR“)  (Fig.  24). 
In  addition,  no  light-dependent  proton-translocating  activity  was 
observed  in  insertion  mutants  unable  to  synthesize  retinal  (CrtY“ 
or  Blh“).  In  contrast,  Idi“  mutants  had  normal  proton-pumping 
activity,  confirming  that  this  gene  was  not  required  under  our 
growth  conditions  (Fig.  2B).  These  results  demonstrate  that  both 
fosmids  independently  expressed  a  functional  PR  with  light- 
activated  proton-translocating  activity. 

PR-Driven  Proton  Translocation  Results  in  Photophosphorylation  in  E. 
coli.  Analogous  to  earlier  studies  of  haloarchaeal  bacteriorho- 
dopsins  (18, 19),  it  was  previously  postulated  that  light-activated, 
PR-induced  proton  motive  force  could  drive  ATP  synthesis  as 
protons  reenter  the  cell  through  the  ATP  synthase  complex  (Fig. 
3/4)  (1,  12).  This  hypothesis  was  not  previously  tested,  however, 
in  either  native  or  heterologously  expressed  PR-based  photo¬ 
systems.  To  this  end,  we  measured  light-induced  changes  in  ATP 
levels  in  the  PR-photosystem-containing  clones  and  PR“  mutant 
derivatives  by  using  a  luciferase-based  assay.  The  assay  measures 
total  ATP,  and  so  we  expected  to  observe  increases  in  ATP 
concentration  only  if  PR-driven  ATP  biosynthesis  exceeded 
endogenous  turnover  rates,  under  our  experimental  conditions. 
Control  pH  measurements  indicated  that  PR'*'  cells  used  in  the 
ATP  assay  were  indeed  capable  of  light-activated  proton  trans¬ 
location  (Fig.  3B).  ATP  measurements  performed  after  5  min  of 
illumination  showed  significant  light-induced  increases  in  cellu¬ 
lar  ATP  levels  in  the  PR'*'  clone  but  not  in  a  PR“  mutant  (Fig. 
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Fig.  3.  Light-activated,  PR-enabled  photophosphorylation  in  E.  coli.  {A) 
Diagram  of  the  proposed  mechanism  of  PR-dependent  ATP  synthesis.  The 
effects  of  the  inhibitors  used  are  indicated.  (S)  Proton-pumping  assays  with 
HF10_19P19cells.  pH  measurements  are  expressed  as  the  pH  change  respect  to 
the  pH  at  time  0  for  each  sample.  CCCP,  25  piM;  DCCD,  1  mM,  Gray  boxes 
indicate  dark  periods.  (C)  ATP  assays  with  HF10_19P19  cells.  Results  are  ex¬ 
pressed  as  the  difference  between  the  ATP  level  in  the  light  and  the  ATP  level 
in  the  dark,  AATP,  for  each  treatment. 


3C).  The  0.3-pmol  increase  in  ATP  observed  in  the  PR'*'  cells 
exposed  to  light  (Fig.  3C)  represents  a  29%  increase  over 
identical  cell  preparations  maintained  in  the  dark,  which  corre¬ 
sponds  to  a  net  gain  after  5  min  of  illumination  of  *«2.2  X  10^ 
molecules  of  ATP  per  colony-forming  unit  (or  viable  cell) 
assayed.  For  comparison,  oxidative  phosphorylation,  measured 
by  ATP  increases  observed  5  min  after  the  addition  of  0.2% 
succinate  to  PR"^  cells  in  the  dark,  resulted  in  a  net  gain  of  9  X 
10^  ATP  molecules  per  live  cell  (data  not  shown). 

Similar  light-activated,  PR-catalyzed  photophosphorylation 
was  also  observed  in  cells  containing  the  HF10_25F10  fosmid. 
Although  the  PR  photosystem  of  this  clone  is  similar  to  that  of 
HF10_19P19,  all  of  the  genes  flanking  the  two  different  photo¬ 
system  gene  suites  are  completely  different  and  derive  from 
different  chromosomal  contexts.  Because  the  PR  and  retinal 
biosynthetic  genes  are  the  only  shared  genes  on  both  clones,  the 
results  strongly  suggest  that  these  specific  PR  photosystem  genes 
are  both  necessary  and  sufficient  to  drive  photophosphorylation 
in  E.  coli  cells. 

To  characterize  more  fully  the  light-driven  photophosphory¬ 
lation  observed  in  PR"^  E.  coli  cells,  we  tested  the  effects  of 
carbonylcyanide  w-chlorophenylhydrazone  (CCCP),  an  uncou¬ 


pler,  and  A,A^'-dicyclohexylcarbodiimide  (DCCD),  a  covalent 
inhibitor  of  IT^-ATP  synthase,  on  light-driven  ATP  synthesis 
(Fig.  3).  We  used  concentrations  that  inhibited  aerobic  growth 
of  E.  coli  on  succinate,  an  oxidative  phosphorylation  process 
requiring  both  proton-motive  force  and  ATP  synthase  activity 
(20).  Addition  of  CCCP,  which  permeabilizes  the  cell  membrane 
to  H"^,  completely  abolished  the  light-driven  decrease  in  pH  and 
subsequently  photophosphorylation.  This  result  demonstrates 
that  both  processes  depend  on  the  establishment  of  an 
electrochemical  gradient.  In  contrast,  addition  of  the  H‘*'-ATP 
synthase  inhibitor  DCCD  did  not  affect  external  pH  changes 
resulting  from  PR-catalyzed  proton  translocation,  but  it  com¬ 
pletely  abolished  photophosphorylation.  This  result  indicates 
that  H'^-ATP  synthase  is  indeed  responsible  for  the  light- 
activated  ATP  increases  we  observed  in  PR'''  cells. 

Discussion 

The  results  presented  here  demonstrate  the  utility  of  functionally 
screening  large-insert  DNA  “metagenomic”  libraries  for  new 
phenotypes  and  activities  directly  and  without  subcloning,  an 
approach  pioneered  by  soil  microbiologists  (15,  21,  22).  Al¬ 
though  large-insert  libraries  increase  the  probability  of  capturing 
complete  metabolic  pathways  in  a  single  clone,  their  low  copy 
number  decreases  the  sensitivity  of  detecting  heterologous  gene 
expression.  We  show  here  that  increasing  fosmid  copy  number 
(13)  can  significantly  enhance  detectable  levels  of  recombinant 
gene  expression  and  therefore  increases  the  detection  rate  of 
desired  phenotypes  in  metagenomic  libraries.  The  PR  photosys¬ 
tem  recombinants  we  characterized  could  be  detected  visually  by 
pigment  production  and  exhibited  light-dependent  proton  trans¬ 
location  and  subsequent  photophosphorylation,  only  when  the 
fosmid  vector  was  induced  to  high  copy  number.  The  approach  g 
was  not  completely  effective  in  detecting  all  targeted  genotypes  g 
however.  Even  under  “copy-up”  conditions,  we  were  unable  to  g 
detect  all  PR-containing  clones  known  to  exist  in  our  library  (4,  5 

7).  Despite  the  limitations,  this  approach  for  functional  screen-  ^ 
ing  of  microbial  community  genomic  libraries  is  useful  for  — 
identifying  specific  activities  or  phenotypes,  even  in  the  absence 
of  sequence  information.  Additionally,  this  approach  provides 
useful  material  for  downstream  genetic  and  biochemical  char¬ 
acterization  and  for  testing  hypotheses  derived  from  bioinfor- 
matic  analyses. 

Genetic  and  biochemical  characterization  of  PR  photosystem- 
containing  clones  reported  here  provided  direct  evidence  that 
only  six  genes  are  required  to  enable  light-activated  proton 
translocation  and  photophosphorylation  fully  in  a  heterologous 
host.  Sabehi  et  al.  (8)  demonstrated  previously  that  coexpression 
of  marine  bacterial  blh  with  PR,  in  the  presence  of  the  /3-carotene 
biosynthetic  genes  from  Erwinia  herbicola,  led  to  /3-carotene 
cleavage  and  subsequent  formation  of  retinal-bound  PR.  We 
show  here  that  a  set  of  six  genetically  linked  genes  known  to  be 
found  in  a  wide  variety  of  different  marine  bacterial  taxa  (7,  8, 

11)  are  both  necessary  and  sufficient  for  the  complete  synthesis 
and  assembly  of  a  fully  functional  PR  photoprotein  in  E.  coli. 

These  heterologously  expressed  marine  bacterial  photosystems 
exhibited  light-dependent  proton  translocation  activity  in  the 
absence  of  exogenously  added  retinal  or  /3-carotene.  One  gene 
in  the  PR  photosystem  cluster  was  dispensable  under  our 
conditions:  the  idi  gene  that  encodes  IPP  8-isomerase,  an  activity 
already  present  in  E.  coli  (17),  as  is  the  FPP  synthase,  catalyzing 
the  next  two  steps  in  the  pathway  (23).  The  presence  of  the  idi 
gene  in  the  cluster  likely  enables  retinal  production  in  the  native 
organism  because  isomerization  of  IPP  to  dimethylallyl  pyro¬ 
phosphate  can  be  a  rate-limiting  step  in  /3-carotene  biosynthesis 
(24,  25). 

It  was  previously  postulated  that  light-activated  proton  trans¬ 
location  catalyzed  by  PR  elevates  the  proton-motive  force, 
thereby  driving  ATP  synthesis  as  protons  reenter  the  cell  through 
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the  H'^'-ATP  synthase  complex  (1,  12).  Although  this  capability 
has  been  demonstrated  for  haloarchaeal  bacteriorhodopsins  (18, 
19,  26,  27),  PR-based  photophosphorylation  has  not  been  dem¬ 
onstrated  previously.  Our  data  demonstrate  that  illumination  of 
cells  expressing  a  native  marine  bacterial  PR  photosystem  gen¬ 
erates  a  proton-motive  force  that  does  indeed  drive  cellular  ATP 
synthesis.  Under  our  experimental  conditions,  5  min  of  illumi¬ 
nation  resulted  in  a  net  gain  of  2.2  X  10^  ATP  molecules  per  cell. 
It  should  be  quite  possible  to  utilize  the  light  to  biochemical 
energy  conversion  enabled  by  the  PR  photosystem,  for  biosyn¬ 
thetic  purposes. 

The  PR  photosystem-catalyzed  photophosphorylation  described 
here  is  consistent  with  a  proposed  role  for  PR  in  marine  microbial 
photoheterotrophy.  A  few  previous  studies  were  unable  to  detect 
light-enhanced  growth  rates  or  yields  in  PR-containing  isolates 
grown  in  seawater  or  natural  seawater  incubations  (28,  29).  In  one 
recent  report,  light  stimulation  of  growth  rate  or  yield  in  Pelagibacter 
ubique,  a  ubiquitous  PR-containing  marine  planktonic  bacterium, 
could  not  be  detected.  These  negative  results  are  somewhat  difficult 
to  interpret  because  natural  seawater  incubations  are  by  necessity 
chemically  undefined,  and  preferred  growth  substrates  or  other 
limiting  nutrients  in  these  experiments  were  unknown.  In  contrast, 
a  significant  enhancement  of  both  growth  rate  and  yield  was 
recently  reported  in  PR-expressing  marine  flavobacterium  (11) 
albeit  a  direct  link  between  PR  and  the  light-induced  growth 
stimulation  was  not  conclusively  demonstrated.  Our  direct  obser¬ 
vation  of  an  intact  PR  photosystem  gene  expression  and  subsequent 
photophosphorylation,  the  recently  reported  light-enhanced 
growth  rates  and  yields  of  PR-containing  Flavobacteria  (11),  and 
the  general  ubiquity  of  PR  photosystem  genes  in  diverse  microbial 
taxa  of  the  ocean’s  photic  zone  (2-9)  all  strongly  support  a 
significant  role  for  PR-based  phototrophy  in  planktonic  marine 
microorganisms. 

In  different  physiological,  ecological,  phylogenetic,  and 
genomic  contexts,  PR  activity  may  benefit  cells  in  a  variety  of 
ways,  some  not  directly  related  to  enhanced  growth  rates  or 
yields.  In  some  bacteria,  the  H'^'-ATP  synthase  functions  as  an 
ATPase  under  low  respiratory  conditions,  hydrolyzing  ATP  and 
driving  proton  efflux  to  maintain  the  proton-motive  force  (30). 
In  the  light,  PR  activity  could  offset  this  effect  and  reverse 
conditions  from  ATP  consumption  to  ATP  production.  PR 
contributions  to  cellular  energy  metabolism  are  likely  to  be 
particularly  important  in  starved  or  substrate-limited  cells.  Sim¬ 
ilar  to  the  situation  for  some  Haloarchaea,  which  use  bacterio- 
rhodopsin  under  oxygen-limiting  conditions  (18,  19,  26,  27),  low 
respiratory  rates  may  trigger  PR  expression  or  activity  in  marine 
bacteria,  as  well.  The  PR-generated  proton-motive  force  can  also 
be  directly  coupled  to  other  energy-requiring  cellular  activities, 
including  flagellar  motility  or  active  transport  of  solutes  into  or 
out  of  the  cell  (32-34).  This  phenomenon  was  recently  demon¬ 
strated  by  the  coupling  of  PR  activity  to  flagellar  rotation  in  E. 
coli  (31).  Although  a  sensory  function  for  some  PR  variants  is 
also  a  possibility  (10,  35),  the  PR  photosystems  described  here, 
and  the  vast  majority  of  others  observed  to  date  (1,  3,  4,  7,  8, 11), 
are  not  genetically  linked  to  sensory  transducers,  the  hallmark  of 
all  known  sensory  rhodopsins.  Most  all  PRs  characterized  so  far 
therefore  appear  to  function  as  light-activated  ion  pumps. 

The  marine  PR  family  is  remarkably  diverse,  has  a  widespread 
phylogenetic  distribution,  and  is  functional  in  both  ether-linked 
phytanyl  and  ester-linked  fatty  acid  membrane  systems  of  Ar- 
chaea  and  Bacteria  (4,  7,  10).  A  recent  survey  found  that 
one-third  of  PR  clones  examined  were  colocalized  with  pho¬ 
topigment  biosynthetic  operons  (7).  Operon  arrangement  and 
distribution,  as  well  as  phylogenetic  relationships,  suggest  that 
lateral  transfer  of  PR  photosystem  genes  is  relatively  common 
among  diverse  marine  microbes  (4,  7,  10).  The  observations 
reported  here  demonstrate  that  acquisition  of  just  a  few  genes 
can  lead  to  functional  PR  photosystem  expression  and  photo¬ 


phosphorylation.  The  ability  of  a  single  lateral  transfer  event  to 
confer  phototrophic  capabilities  likely  explains  the  ecological 
and  phylogenetic  prevalence  of  these  photosystems  in  nature.  In 
principle,  any  microorganism  capable  of  synthesizing  FPP  (a 
widespread  intermediate  in  isoprenoid  biosynthesis)  could 
readily  acquire  this  capability,  as  we  have  observed  in  E.  coli. 

Apparently,  many  otherwise  chemoorganotrophic  microbes  in 
the  ocean’s  photic  zone  have  acquired  the  ability  to  use  Light  energy 
to  supplement  cellular  energy  metabolism.  The  broad  array  of 
PR-containing  microbes  reflects  the  photosystem’s  fundamental 
contribution  to  cellular  bioenergetics,  a  simplicity  and  compactness 
that  favors  PR  photosystem  lateral  mobility,  and  a  remarkable 
plasticity  that  enables  photoprotein  assembly  and  function  in  a 
diversity  of  phylogenetic  groups  and  cell  membrane  types.  From  a 
genetic,  physiological,  and  ecological  perspective,  the  transition 
from  heterotrophy  to  PR-enabled  photoheterotrophy  seems  to 
represent  a  relatively  small  evolutionary  step  for  contemporary 
microorganisms. 

Materials  and  Methods 

Fosmid  Library.  The  HOT.lOm  fosmid  library  screened  in  this 
work  has  been  described  previously  (14).  It  contains  DNA  from 
a  planktonic  sample  collected  10  m  below  the  surface  at  the 
ALOHA  station  (22°45'  N,  15S°W)  of  the  Hawaii  Ocean  Time 
series  (HOT)  cloned  into  the  copy-controlled  pCClFOS  fosmid 
vector  (Epicentre  Biotechnologies,  Madison,  WI).  The  library 
host,  E.  coli  EPI300  (Epicentre  Biotechnologies),  supports  the 
copy-control  option  of  pCClFOS. 

Screening  for  PR  Expression.  High-density  colony  macroarrays 
[12,280  clones  of  the  HOT.lOm  library  (ref.  14)]  were  prepared 
on  a  Performa  II  filter  (Genetix  Ltd.,  Boston,  MA)  by  using  a 
Q-PixII  robot  (Genetix  Ltd.).  The  filter  was  carefully  laid  over  a 
22-cm  plate  containing  250  ml  of  LB  agar  supplemented  with 
L-arabinose  (0.02%),  the  copy-up  inducer,  and  fl//-rra/75-retinal 
(20  jLiM),  and  the  plate  was  incubated  at  37°C  for  24  h.  The  filters 
were  used  to  facilitate  the  visual  detection  of  color  against  the 
white  background.  Colonies  were  inspected  visually  for  the 
appearance  of  orange  or  red  color.  Fosmid  DNA  from  positive 
clones  was  retransformed  into  fresh  E.  coli  EPI300  and  re¬ 
screened  as  above  to  verify  that  the  color  was  conferred  by  the 
fosmid.  The  end  DNA  sequence  of  the  positive  clones  was 
obtained  by  using  primers  T7  and  EpiFos5R  as  described 
previously  (14). 

In  Vitro  Transposition  and  Full  Fosmid  Sequencing.  Fosmid  clones  to 
be  characterized  were  submitted  to  random  in  vitro  transposition  by 
using  the  EZ-Tn5<kan-2>  insertion  kit  (Epicentre  Biotechnolo¬ 
gies)  according  to  the  manufacturer’s  instructions.  The  transposi¬ 
tion  reaction  was  transformed  by  electroporation  into  EPI300  cells, 
and  clones  containing  fosmids  with  Tn5  insertions  were  selected  in 
LB  chloramphenicol,  kanamycin  (12  p,g/ml  and  25  respec¬ 

tively).  The  color  phenotype  of  individual  Tn5-insertion  clones  was 
analyzed  on  LB  plates  containing  chloramphenicol,  kanamycin, 
and  0.02%  L-arabinose  as  above.  DNA  sequencing  off  the  Tn5  ends 
was  performed  by  using  KAN-2  FP-1  and  KAN-2  RP-1  primers,  a 
BigDye  version  3. 1  cycle  sequencing  kit,  and  ABI  Prism  3700  DNA 
analyzer  (Applied  Biosystems,  Forest  City,  CA).  The  complete 
DNA  sequence  was  assembled  by  using  Sequencher  version  4.5 
(Gene  Codes  Corporation,  Ann  Arbor,  MI)  and  annotated  with 
FGENESB  (Softberry,  Mount  Kisco,  NY)  and  Artemis  version  6 
(The  Wellcome  Trust  Sanger  Institute,  Cambridge,  U.K.). 

Carotenoid  Extraction.  Overnight  cultures  of  the  appropriate 
clones  were  diluted  1:100  into  50  ml  of  LB  chloramphenicol  (12 
p.g/ml)  and  incubated  for  3  h  at  37°C  with  shaking  (200  rpm).  At 
that  point,  L-arabinose  was  added  to  a  0.02%  final  concentration, 
and  cultures  were  incubated  for  16  h.  Cells  were  harvested  by 
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centrifugation  and  rinsed  twice  in  salt  solution.  Cell  pellets  were 
kept  frozen  (— 20°C)  in  the  dark.  Frozen  cells  were  extracted  by 
sonication  (5  min)  in  a  cold  4:1  (vol/vol)  mixture  of  acetone/ 
methanol  (OmniSolv;  HMD  Chemicals,  Gibbstown,  NJ)  with  0.1 
mM  butylated  hydroxytoluene  added  as  an  antioxidant.  Cells 
were  pelleted  by  centrifugation,  and  the  supernatant  was  re¬ 
moved  and  filtered  through  ashed  silica  gel  (230-400  mesh; 
HMD  Chemicals).  Extracts  were  then  concentrated  by  evapora¬ 
tion  under  dry  N2.  All  extraction  steps  were  performed  in 
darkness  or  low  light  to  minimize  carotenoid  photooxidation. 

HPLC  Analysis.  Chromatographic  separation  and  analysis  of  caro¬ 
tenoids  by  high-performance  liquid  chromatography  (HPLC) 
adapted  a  reverse-phase  method  from  Barua  and  Olson  (36).  A 
5-/Ltm  Zorbax-ODS  C18  column  (150  X  4.6  mm)  (Agilent 
Technologies,  Palo  Alto,  CA)  was  used  at  30°C  in  a  column  oven 
with  a  Waters  (Milford,  MA)  2795  separations  module  operated 
with  MassLynx  4.0  software.  Separation  was  achieved  with  a 
linear  gradient  at  a  flow  rate  of  0.8  ml/min:  100%  solvent  A  to 
100%  B  over  20  min  followed  by  isocratic  elution  with  B  for  an 
additional  20  min,  where  A  =  methanol/water  (3:1  vol/vol)  and 
B  =  methanol/dichloromethane  (4:1  vol/vol).  The  column  was 
equilibrated  after  each  run  with  solvent  A  for  10  min.  The 
detector  was  a  Waters  996  photodiode  array  detector  scanning 
wavelengths  from  190  to  800  nm  with  a  resolution  of  1.2  nm  and 
sampling  rate  of  one  spectrum  per  s.  Carotenoids  were  identified 
by  comparing  absorbance  spectra  and  retention  times  with 
authentic  standards. 

Proton-Pumping  Experiments.  Clones  to  be  analyzed  for  proton¬ 
pumping  activity  were  streaked  on  15-cm  LB  agar  plates  con¬ 
taining  12  jLtg/ml  chloramphenicol  and  0.001%  L-arabinose  and 
incubated  at  37°C  for  48  h.  Cells  were  resuspended  in  20  ml  of 
salt  solution  (10  mM  NaCl/10  mM  MgCl2/100  /liM  CaCl2,  pH7.0), 
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rinsed  twice,  and  adjusted  to  an  Agoo  of  0.5-0. 7.  Two  milliliters 
of  cell  suspension  was  placed  in  an  RPC-100  photosynthetic 
chamber  (i-Works,  Dover,  NH)  connected  to  a  22°C  circulating 
water  bath.  pH  was  measured  by  using  a  Beckman  (Fullerton, 
CA)  0360  pH  meter  equipped  with  a  Futura  microelectrode. 
Light  was  provided  by  a  160-watt  halogen  lamp  placed  4  cm  from 
the  chamber.  Irradiance  within  the  chamber  was  500-650  jumol 
Q  m”^  s“^. 

ATP  Measurements.  Cell  suspensions  were  prepared  as  above. 
Three  milliliters  of  cell  suspension  was  placed  in  5-ml  screw-cap 
glass  vials.  The  vials  for  the  dark  samples  were  wrapped  in  foil. 
Ten  centimeters  of  water  was  used  to  minimize  heat  transfer  to 
the  samples.  Irradiance  under  these  conditions  was  650  jxmol  Q 
m“^  s”^.  ATP  was  measured  by  using  a  luciferase-based  assay 
(BactTiter  Glo,  Promega,  Madison,  WI)  as  follows.  At  each  time 
point,  5  aliquots  (20  /xl  each)  of  every  sample  were  dispensed  into 
white  96-weIl  assay  plates  [CoStar  (Bethesda,  MD)  3917].  One 
hundred  microliters  of  BactTiterGlo  reagent  was  added  per  well, 
and  luminescence  was  measured  after  10  min  using  a  Victor3 
plate  reader  (PerkinElmer,  Waltham,  MA)  with  a  1-s  integration 
time.  An  ATP  standard  curve  was  used  to  calculate  the  con¬ 
centration  of  ATP  in  the  samples.  For  inhibitor  experiments,  cell 
suspensions  were  incubated  in  the  dark  for  20  h  in  the  presence 
of  1  mM  DCCD  or  for  2  h  in  the  presence  of  25  (iM  CCCP  or 
the  ethanol  vehicle.  Succinate  was  added  to  a  0.2%  final  con¬ 
centration  to  measure  ATP  synthesis  from  respiration  in  the 
dark. 
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In  the  modern  oceans,  diatoms,  dinoflagel- 
lates,  and  coccolithophorids  play  prominent 
roles  in  primary  production  (Falkowski  et  al. 
2004).  The  biological  observation  that  these 
groups  acquired  photosynthesis  via  endo- 
symbiosis  requires  that  they  were  preceded  in 
time  by  other  photoautotrophs.  The  geologi¬ 
cal  observation  that  the  three  groups  rose  to 


geobiological  prominence  only  in  the  Mesozoic 
Era  also  requires  that  other  primary  producers 
fueled  marine  ecosystems  for  most  of  Earth 
history.  The  question,  then,  is:  what  did  pri¬ 
mary  production  in  the  oceans  look  like  before 
the  rise  of  modem  phytoplankton  groups? 

In  this  chapter,  we  explore  two  records 
of  past  primary  producers:  morphological 
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fossils  and  molecular  biomarkers.  Because 
these  two  windows  on  ancient  biology  are 
framed  by  such  different  patterns  of  pres- 
ervational  bias  and  diagenetic  selectivity, 
they  are  likely  to  present  a  common  picture 
of  stratigraphic  variation  only  if  that  view 
reflects  evolutionary  history. 

I.  RECORDS  OF 
PRIMARY  PRODUCERS 
IN  ANCIENT  OCEANS 


A.  Microfossils 

Microfossils,  preserved  as  organic  cell 
walls  or  mineralized  tests  and  scales,  record 
the  morphologies  and  (viewed  via  transmis¬ 
sion  electron  microscopy)  ultrastructures  of 
ancient  microorganisms.  Such  fossils  can 
provide  unambiguous  records  of  phyto¬ 
plankton  in  past  oceans — diatom  frustules, 
for  example — and  they  commonly  occur 
in  large  population  sizes,  with  numerous 
occurrences  that  permit  fine  stratigraphic 
resolution  and  wide  geographic  coverage. 

Set  against  this  is  a  number  of  factors 
that  limit  interpretation.  Not  all  photoau¬ 
totrophs  produce  preservable  cell  walls  or 
scales,  and,  of  those  that  do,  not  all  gener¬ 
ate  fossils  that  are  taxonomically  diagnos¬ 
tic.  Thus,  although  many  modern  diatoms 
precipitate  robust  frustules  of  silica  likely 
to  enter  the  geologic  record,  others  secrete 
weakly  mineralized  shells  with  a  corre¬ 
spondingly  lower  probability  of  preserva¬ 
tion.  Similarly,  whereas  dinoflagellates  as  a 
group  have  left  a  clear  record  of  dinocysts, 
many  extant  species  do  not  produce  pre¬ 
servable  cysts  and  others  form  cysts  that 
would  not  be  recognized  unambiguously 
as  dinoflagellate  in  fossil  assemblages.  (The 
phylogenetic  affinities  of  fossil  dinocysts 
are  established  by  the  presence  of  an  arche- 
opyle,  a  distinctive  excystment  mechanism 
peculiar  to  but  not  universally  found  within 
dinoflagellates.)  Especially  in  the  early  his¬ 
tory  of  a  group,  character  combinations  that 
readily  distinguish  younger  members  may 
not  be  in  place.  Thus,  stem  group  diatoms 


without  well-developed  frustules  might 
well  leave  no  morphologic  record  at  all  in 
sediments. 

By  virtue  of  their  decay-resistant  extracel¬ 
lular  sheaths  and  envelopes,  many  cyano¬ 
bacteria  have  a  relatively  high  probability 
of  entering  the  fossil  record,  and  some  ben¬ 
thic  lineages  are  both  readily  preservable 
and  morphologically  distinctive  (Knoll  and 
Golubic  1992).  On  the  other  hand,  impor¬ 
tant  picoplankton  such  as  Prochlorococcus  are 
unlikely  to  leave  recognizable  body  (or,  as 
it  turns  out,  molecular)  fossils.  A  number  of 
algal  clades  include  good  candidates  for  fos- 
silization,  especially  groups  with  distinctive 
resting  stages  (phycomate  prasinophytes, 
dinoflagellates)  or  mineralized  skeletons 
(diatoms,  coccolithophorids,  coralline  reds, 
caulerpalean  and  dasyclad  greens).  Other 
primary  producers  fossilize  occasionally  but 
only  imder  imusual  depositional  or  diage¬ 
netic  circumstances  (e.g.,  Butterfield  2000; 
Xiao  et  al.  2002, 2004;  Foster  and  Afonin  2006), 
and  still  others  rarely  if  ever  produce  mor¬ 
phologically  interpretable  fossils. 

Diagenesis  can  obliterate  fossils  as  well 
as  preserve  them:  organic  walls  are  subject 
to  postdepositional  oxidation  and  miner¬ 
alized  skeletons  may  dissolve  in  under¬ 
saturated  pore  waters.  The  result  is  that 
presence  and  absence  cannot  bo  weighted 
equally  in  micropaleontology.  The  presence 
of  a  fossil  unambiguously  shows  that  the 
cell  from  which  it  derived  lived  at  a  certain 
time  in  a  particular  place,  but  absence  may 
reflect  true  absence,  low  probability  of  fos- 
silization,  or  obfuscating  depositional  or 
diagenetic  conditions.  For  older  time  inter¬ 
vals,  tectonic  destruction  of  the  sedimentary 
record  imposes  an  additional  challenge;  in 
particular,  subduction  inexorably  destroys 
oceanic  crust  and  the  sediments  that  mantle 
it,  so  that  deep-sea  sediments  are  common 
only  in  Jurassic  and  younger  ocean  basins. 

B.  Molecular  Biomarkers 

The  chemical  constituents  of  biomass  pro¬ 
duced  by  living  organisms  can  be  incorporated 
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into  sediments  and  ultimately  into  sedimen¬ 
tary  rocks  that  can  survive  for  billions  of  years. 
Where  these  compounds  are  preserved  in 
recognizable  forms,  they  represent  another 
opportunity  for  organisms  to  leave  a  trace 
of  themselves  in  the  fossil  record.  Organic 
biomarkers  are  the  diagenetically  altered 
remains  of  the  products  of  cellular  biosyn¬ 
thesis  and  may  be  aptly  termed  molecular 
fossils.  Most  biomarkers  are  derived  from 
lipids  and  are  potentially  stable  over  bil¬ 
lion-year  time  scales  under  ideal  conditions 
(Brocks  and  Summons  2004). 

Given  the  variety  of  organic  compounds 
produced  by  cells  and  the  vast  quantities  of 
sedimentary  organic  matter  in  a  rock  record 
that  stretches  back  billions  of  years,  biomar¬ 
kers  are  a  potentially  rich  source  of  informa¬ 
tion  concerning  the  diversity  and  ecology  of 
ancient  communities.  However,  the  process 
of  organic  matter  incorporation  into  rocks 
and  its  transformation  during  deep  burial 
imposes  some  strong  constraints  on  the 
kinds  of  information  that  can  be  recovered 
millions  of  years  after  the  fact.  The  classes  of 
molecules  that  contain  molecular  sequence 
information,  nucleic  acids,  and  most  pro¬ 
teins,  do  not  survive  long  in  the  geologic 
environment.  DNA  can  survive  for  at  least 
a  few  hundred  thousand  years,  especially  in 
reducing  environments  such  as  euxinic  sed¬ 
iments  (e.g.,  Coolen  d  al.  2004)  where  het¬ 
erotrophy  is  curtailed  by  a  lack  of  electron 
acceptors,  but  it  is  not  an  option  where  the 
aim  is  to  look  at  changes  on  million-year  or 
longer  time  scales.  Other  kinds  of  biomol¬ 
ecules,  however,  prove  remarkably  resilient 
in  the  rock  record. 

Any  molecule  with  a  hydrocarbon  skel¬ 
eton  has  the  potential  to  be  preserved  over 
long  periods.  For  the  most  part,  this  refers 
to  the  hydrocarbon  portions  of  membrane 
lipids,  which  are  the  major  constituent 
of  extractable  organic  matter  (bitumen) 
in  sedimentary  rocks.  Diagenesis  quickly 
strips  these  compounds  of  their  reactive 
polar  functionalities,  and  over  longer  peri¬ 
ods  causes  stereochemical  and  structural 
rearrangements,  but  hydrocarbon  skeletons 


can  remain  recognizable  as  the  products  of 
particular  biosynthetic  pathways  on  time 
scales  that  approach  the  age  of  the  Earth 
(e.g.,  Brocks  and  Summons  2004;  Peters 
et  al.  2005). 

The  character  of  the  information  con¬ 
tained  in  molecular  fossils  is  variable. 
Some  are  markers  for  the  presence  and, 
to  the  extent  they  can  be  quantified  rela¬ 
tive  to  other  inputs,  abundance  of  particu¬ 
lar  organisms.  The  taxonomic  specificity 
of  such  biomarkers  ranges  from  species  to 
domain  level.  Others  are  markers  for  the 
operation  of  a  particular  physiology  or  bio¬ 
synthetic  pathway  that  may  have  a  broad 
and/or  patchy  taxonomic  distribution.  Still 
other  kinds  of  biomarkers  are  most  strongly 
associated  with  specific  depositional  set¬ 
tings,  making  their  presence  more  indica¬ 
tive  of  paleoenvironmental  conditions  than 
of  any  particular  biology.  Interpretation  of 
the  molecular  fossil  record  depends  on  our 
ability  to  recognize  biomarker  compounds, 
link  them  to  biosynthetic  precursors,  and 
then  make  inferences  about  what  the  pres¬ 
ence  of  those  molecules  in  the  rock  record 
tells  us  about  contemporary  biology  and 
geochemistry. 

Turning  to  biomarkers  that  might  estab¬ 
lish  a  molecular  fossil  record  of  primary  pro¬ 
duction  in  marine  settings,  several  classes 
of  compounds  are  promising  for  their  com¬ 
bination  of  biochemical  and/or  taxonomic 
specificity.  Pigments  are  natural  candidates, 
representing  markers  of  the  photosynthetic 
machinery  itself.  Input  of  chlorophyll  to  sedi¬ 
ments  can  result  in  several  kinds  of  molecular 
fossils,  including  porphyrins  and  the  pristane 
and  phytane  skeletons  of  the  chlorophyll 
side  chain  (Figure  1).  It  was  the  recognition 
of  vanadyl  porphyrin  as  the  molecular  fossil 
of  chlorophyll  that  led  Alfred  Treibs  (1936)  to 
make  the  first  compelling  chemical  argument 
for  the  biogenic  origin  of  petroleum.  Other 
pigments,  such  as  carotenoids,  are  subject  to 
very  selective  preservation,  generally  requir¬ 
ing  the  presence  of  reduced  sulfur  species; 
the  functional  groups  that  confer  many  of 
their  biophysical  properties  and  taxonomic 
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Pristane 


FIGURE  1.  Structures  of  diagnostic  phytoplankton  lipids  (left)  and  their  fossil  counterparts  (right). 


specificity  are  lost  through  chemical  reduction 
processes  early  in  diagenesis  (e.g.,  Kohnen 
et  al.  1991, 1993;  Hebting  et  al.  2006). 

Although  the  preservation  of  pigment- 
derived  biomarkers  is  spotty  and  informa¬ 
tion  is  steadily  lost  over  time  as  diagenesis 
proceeds,  another  class  of  extraordinarily 
durable  molecules  provides  us  with  much 


of  the  molecular  fossil  record  of  primary 
producers,  particularly  in  Paleozoic  and 
older  rocks.  These  are  the  polycyclic  triter¬ 
penoids  produced  by  the  cyclization  of  the 
isoprenoid  squalene  and  found  in  the  mem¬ 
branes  of  both  eukaryotes  and  bacteria. 
The  main  types  are  the  steroids,  which  are 
ubiquitous  among  the  Eucarya  but  known 
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from  only  a  very  few  bacteria,  and  the 
hopanoids,  including  the  bacteriohopanep- 
olyols  (BHPs),  produced  by  a  wide  variety 
of  autotrophic  and  heterotrophic  bacteria. 
These  molecules  have  the  great  advan¬ 
tages  of  a  durable  polycyclic  skeleton  that 
is  clearly  a  biological  product  and  a  well- 
characterized  diagenetic  fate  involving  a 
number  of  rearrangements  that  provide 
information  about  the  postburial  history  of 
the  organic  matter.  The  structures  of  some 
commonly  used  biomarker  lipids  and  their 
fossil  counterparts  are  shown  in  Figure  1. 
Table  1  summarizes  current  knowledge  of 
the  biological  affinities  of  hydrocarbons 
commonly  found  in  marine  sediment  sam¬ 
ples,  excluding  biomarkers  derived  from 
terrestrial  organisms. 

Molecular  fossils  suffer  from  some  of 
the  same  limitations  as  body  fossils.  Not  all 
ecologically  and  biogeochemically  impor¬ 
tant  groups  leave  distinctive  molecular  fin¬ 
gerprints,  making  them  difficult  to  follow 
in  time.  Moreover,  the  structures  of  lipids 
are  not  nearly  as  diverse  as  body  fossils; 
different,  often  distantly  related  or  physi¬ 
ologically  disparate  organisms  can  produce 
similar  patterns  of  lipids.  Generally,  biomar¬ 
kers  will  reflect  an  average  of  inputs  to  sedi¬ 
ments,  which  can  be  influenced  by  factors 
including  bottom  water  oxygenation,  sedi¬ 
ment  mineralogy,  and  grain  surface  area 
available  for  sorptive  protection  (Hedges 
and  Keil  1995).  These  inputs  are  attenuated 
by  remineralization  of  organic  matter  as  it 
sinks:  this  reworking  is  >95%  complete  by 
3000m  depth  (Martin  et  al.  1987).  The  high 
degree  of  water-column  degradation  of 
organic  matter  in  deep  basins  means  that, 
even  where  deep-water  sediments  survive 
subduction,  they  commonly  contain  little 
organic  matter;  hence,  the  biomarker  record 
of  open-ocean  primary  production  is  poor. 
The  problem  of  interpreting  absence  can 
be  acute,  because  in  biomarker  analysis, 
absence  can  only  be  defined  in  terms  of 
detection  limits  and,  hence,  is  conditionally 
dependent  upon  the  analytical  technology 
available. 


Much  of  what  is  known  about  the  dia¬ 
genesis  and  preservation  of  biomarkers 
derives  from  studies  of  the  origin  and  com¬ 
position  of  petroleum  (e.g.,  Peters  et  al. 
2005).  Petroleum  geologists  were  initially 
interested  in  identifying  the  source  rocks 
from  which  hydrocarbon  accumulations 
originated.  Information  on  the  thermal  his¬ 
tories  of  source  rocks  is  also  key  for  mod¬ 
eling  hydrocarbon  generation.  Biomarkers 
provide  a  way  to  determine  both  param¬ 
eters  through  complementary  analyses  of 
sedimentary  bitumen  in  source  rocks  and 
their  derived  oil  accumulations.  Companies 
serving  the  petroleum  exploration  indus¬ 
try  have  developed  and  maintained  data¬ 
bases  of  bitumen  and  oil  composition  that 
can  be  used  to  compare  oils  to  bitumen  in 
their  source  horizons  and  model  the  ther¬ 
mal  histories  of  petroleum  deposits.  These 
databases  can  be  employed  as  a  predictive 
tool  when  exploring  in  frontier  regions.  An 
example  is  the  commercial  “Oils"  database 
generated  by  GeoMark  Research,  which 
records  geochemical  analyses  of  more  than 
10,000  crude  oil  samples  from  every  known 
petroliferous  basin  on  the  globe  (www.geo- 
markresearch.com).  The  "Oils"  data  com¬ 
prise  the  contents  of  S,  Ni,  and  V;  the  carbon 
isotopic  compositions  of  bulk  saturated  and 
aromatic  hydrocarbons;  and  quantitative 
analysis  of  approximately  100  individual 
hydrocarbons,  including  n-alkanes,  acyclic 
isoprenoids,  steroids,  and  triterpenoids. 
Abundances  of  the  latter  biomarkers,  which 
have  been  determined  using  a  rigorously 
reproducible  analytical  protocol,  allow  cal¬ 
culation  of  23  diagnostic  molecular  ratios 
that  can  be  used  to  predict  paleoenviron- 
mental  features  of  an  oil's  source  rock 
without  direct  knowledge  of  the  rock  itself 
(Zumberge  1987)  or  to  evaluate  hydrocar¬ 
bon  charge  histories  from  field  to  basin 
scales  (e.g.,  Zumberge  et  al.  2005).  Aver¬ 
aging  of  data  from  numerous  oil  samples 
within  a  well,  field,  or  an  entire  basin  helps 
to  overcome  anomalies  in  individual  hydro¬ 
carbon  samples  that  reflect  differences  in 
maturity  and  losses  from  evaporation,  water 
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TABLE  1.  Hydrocarbon  biomarkers  prevalent  in  marine  sediments  and  petroleum  derived 
from  marine  sediments  and  their  known  source  organisms 

Fossil  hydrocarbon” 

Functionalized  precursors 

Established  sources'’  ^’ 

References^* 

bacteriohopanes 

C,5  bacteriohopanepolyols 
(BHPs) 

Bacteria  although 
nonspecific 

Rohmer  et  al.  1984 

1992 

2-Methylhopanes 

2-Methyl-BHP 

Cyanobacteria  although 
also  in  methanotrophs 
and  other  bacteria 

Zundel  and  Rohmer 
1985b,  1985c;  Bisseret 
et  al.  1985;  Summons 
et  al  1999 

3-Methylhopanes 

3-Methyl-BHP 

s" 

Methanotrophs,  other 
proteobacteria 

Zundel  and  Rohmer 

19  85a 

Aryl  isoprenoids. 

Aromatic  carotenoids,  e.g.. 

Green  and  purple  sulfur 

Summons  and  Powell 

isorenieratane 

isorenieratene,  okenone 

bacteria 

1987;  Brocks  et  al.  2005 

Gammacerane 

Tetrahymanol 

Purple  nonsulfur  bacteria, 
some  protists 

Ten  Haven  et  al.  1989; 
Kleemann  et  al.  1990 

Tricyclic  terpanes, 
cheilanthanes 

Unknow'n 

Unknown,  probably 
bacteria 

Moldowan  et  al. 

1983 

<Cj„  acyclic  isoprenoids 

Bacterial  and  algal 
chlorophylls  archaeol 

Photosynthetic  bacteria 
and  protists  Archaea 
although  nonspecific 

Peters  et  al.  2005 

>C2ij  acyclic  isoprenoids 

Glycerol  ether  lipids;  also 
found  as  free  hydrocarbons 

Archaea  although 
nonspecific 

Peters  et  al.  2005 

Cholestane 

Cholesterol  and  related 

C27  sterols 

Photosynthetic  protists, 
metazoa 

Volkman  2003 

Ergostane; 

24-methy  Icholes  tane 

Erogosterol  and  related 
sterols 

Photosynthetic  protists, 
prevalent  in  diatoms 

Volkman  2003 

Stigmastane; 

24-ethylcholestane 

Sitosterol,  stigmasterol, 
and  related  sterols 

Photosynthetic  protists, 
prevalent  in  chlorophytes 

Volkman  2003 

24-n-Propylcholestane 

24-n-Propylcholesterol 

Marine  chrysophytes 

Moldowan  1984 

Dinosterane,  triaromatic 
dinosteroids 

Dinosterol,  dinostanol 

Dinoflagellates 

Summons  et  al.  1987, 
1992;  Moldowan  and 
Talyzina  1988 

24-Norcholestane 

24-Norcholesterol  and 
related  24-nor  sterols 

Diatoms 

Rampen  et  al.  2005 

and 

highly  branched 
isoprenoids  (1  IBIs) 

Mono-  or  polyunsaturated 
HBIs 

Diatoms 

Volkman  et  al.  1994; 

Belt  etal.lOQQ} 
Sinninghe  Damste 
et  al.  2004 

alkenones 
or  alkanes 

alkenones 

Haptophytes 

Volkman  et  al.  1980; 
Marlowe  et  al.  1984 

“In  most  cases,  the  connection  between  the  fossil  hydrocarbon  and  organismic  source  is  supported  by 
detection  of  diagenetic  intermediates  in  sediments. 

'’Extensive,  systematic  studies  of  lipids  from  microbial  cultures  are  rare.  Many  lipid-organism  relationships 
remain  unknown. 

‘’Genomic  sequencing  will  also  help  identify  the  metabolic  potential  of  source  organisms  to  produce 
preservable  compounds;  such  identification  depends  on  {currently  incomplete)  knowledge  of  the  biosynthetic 
pathways  of  biomarker  molecules. 

■'Indicative  but  incomplete  list.  Reviews  that  provide  extensive  citation  lists  include  Brocks  and  Summons  2004 
and  Peters  et  al.  2005. 


276 


RECORDS  OF  I’RIMARY  PRODUCERS  IN  ANCIENT  OCEANS 


139 


washing,  or  biodegradation.  Although  it  is 
not  widely  appreciated,  the  global  ubiquity 
of  sedimentary  bitumen  and  oil  accumula¬ 
tions  of  all  sizes  means  that  hydrocarbons 
can  also  serve  as  a  source  of  information 
on  trends  in  the  global  carbon  cycle  (e.g., 
Andrusevich  et  al.  1998,  2000)  and  patterns 
in  the  evolution  and  environmental  distri¬ 
butions  of  organisms,  as  discussed  further 
later.  The  data  illustrated  in  Figures  2,  .3,  4, 
5,  and  6  come  from  the  "Oils"  database  and 
represent  the  averages  of  numerous  sam¬ 
ples  from  a  global  selection  of  prominent 
Cenozoic  to  Proterozoic  petroleum  systems, 
both  marine  and  lacustrine. 

Inspection  of  trends  in  the  "Oils"  database 
suggests  that  some  aspects  of  the  composi¬ 


tion  of  sedimentary  hydrocarbons  appear  to 
be  relatively  invariant  with  age,  changing 
instead  with  source  rock  lithology  and  sedi¬ 
mentary  environment.  These  features  are 
mostly  reflected  in  the  abundance  patterns 
of  bacterial  and  archaeal  biomarkers.  An 
example  is  depicted  in  Figure  2,  which  plots 
the  relative  abundance  of  a  diagnostic  bac- 
teriohopane  hydrocarbon  (30-norhopane)  as 
a  function  of  paleolatitude.  30-Norhopane  can 
arguably  originate  in  several  ways,  but  one 
particularly  prolific  source  would  be  those 
BFiPs  with  a  hydroxyl  substituent  at  position 
"Z"  (see  Figure  1);  this  makes  them  prone  to 
oxidative  cleavage,  leading  to  a  hydro¬ 
carbon  (Rohmer  et  al.  1992).  Such  precursor 
BHPs  occur  commonly  in  proteobacteria. 


1.8 


I  1.2 


o  1.0 


□  Cenozoic  Carbonate  SR 

□  Mesozoic  Carbonate  SR 
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FIGURE  2.  This  figure  depicts  a  diagnostic  biomarker  ratio  derived  from  the  averaged  analyses  of  numerous  oil 
samples  representing  important  commercial  petroleum  accumulations  plotted  versus  their  paleolatitude.  Samples 
are  grouped  according  to  geological  era  and  classified  according  to  the  lithology  and  environment  of  the  source 
rock,  namely  marine  distal  shales,  marine  marls,  marine  carbonates,  or  lacustrine  sediments.  The  ratio  of  hopanes 
with  29  carbons  to  those  with  30  carbons  tends  to  be  highest  in  carbonates  and  marls  and  in  samples  from  low 
paleolatitude.  Oils  sourced  from  marine  distal  shales  invariably  show  a  G2.J/C hopane  ratio  <0.7,  whereas  marine 
carbonates  tend  to  have  values  of  0.8  or  more.  This  pattern  holds  irrespective  of  age  over  the  duration  of  the  Phan- 
erozoic. 
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Source  rock  age  Ma 

FIGURE  3.  The  secular  increase  in  the  abundance  of  relative  to  steranes  over  the  Phanerozoic  Eon,  par¬ 
ticularly  during  the  last  250  million  years,  in  petroleum  systems  from  the  GeoMark  Oils  database.  Overlain  are  the 
fossil  diatom  species  and  genera  diversity  curves  from  Katz  et  al.  (2004). 


Source  rock  age  Ma 


FIGURE  4.  The  relative  abundance  of  aromatic  dinosteranes  over  the  Phanerozoic  Eon  from  the  GeoMark  Oils 
database,  showing  marked  increase  during  the  early  Mesozoic.  Note  concordance  with  genera-  and  species-level 
dinoflagellate  fossil  cyst  diversity  curves  of  Katz  et  al  (2004).  Paleozoic  occurrences  of  dinosteranes  in  petroleum 
sources  are  infrequent  but  merit  further  attention. 
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FIGURE  5.  For  legend  see  next  page. 
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FIGURE  5.  (Cont’d)  (A)  2-methylhopane  (2-MeH)  index  2-MeH/(C3g  2-MeH  +  desmethyl  hopane) 
through  geologic  time.  Values  are  generally  elevated  throughout  the  Precambrian,  with  distal  shales  reaching  below 
0.05  only  during  the  Phanerozoic.  Compilation  includes  data  from  the  GeoMark  Oils  database  and  Kuypers  et  al. 
(2004)  (B)  Expansion  of  the  time  scale  of  (A)  to  focus  on  the  Phanerozoic.  Source  rocks  deposited  during  oceanic 
anoxic  events  (OAEs)  are  highlighted  and  often  show  elevated  2-MeH  indices.  (C)  2-MeH  indices  of  Phanerozoic 
petroleum  systems  by  source  rock  lithology  and  paleolatitude.  The  highest  2-MeH  values  are  generally  found  at 
low  latitudes,  often  in  carbonate  depositional  environments. 


including  methylotrophs.  They  appear  to  be 
especially  common  in  carbonate-precipitat¬ 
ing  sedimentary  environments;  the  ratio  of 
hopane  tends  to  be  highest  in  oils 
from  carbonates,  intermediate  in  marls,  and 
lowest  in  distal  shales  (Subroto  et  al.  1991). 
This  relationship  holds  independent  of  the 
geological  age  of  the  source  rocks.  Car¬ 
bonates  accumulate  predominantly  in  low 
latitudes,  explaining  the  paleogeographic 
correspondence  shown  in  Figure  2. 

In  contrast  to  the  trends  shown  by 
biomarkers  from  prokaryotes,  acyclic  and 
cyclized  terpenoids  and  steroids  (see  Figure 
1)  derived  from  planktonic  algae  and  vas¬ 
cular  plants  show  strong  age-related  trends 
(e.g.,  Summons  and  Walter  1990;  Brocks  and 
Summons  2004).  For  example,  the  oleanoid 
triterpenoids  such  as  (i-amyrin  are  impor¬ 
tant  in  the  predation-defense  mechanisms 
of  flowering  plants.  Oleanane,  a  hydro¬ 
carbon  derived  from  these  triterpenoids. 


is  sometimes  abundant  in  oils  from  rocks 
deposited  on  continental  margins,  show¬ 
ing  a  marked  increase  in  oils  formed  from 
Cenozoic  rocks  that  reflects  the  Cretaceous 
radiation  of  the  angiosperms  (Moldowan 
et  al.  1994).  Low  levels  of  oleanane  in  rocks  as 
old  as  Jurassic  age,  however,  suggests  either 
that  early  flowering  plants  were  sporadic 
inhabitants  of  Mesozoic  environments  or 
that  oleanoid  triterpenoid  synthesis  origi¬ 
nated  in  their  phylogenetic  precursors  (e.g., 
Peters  et  al.  2005).  Observations  concerning 
algal  biomarkers  are  discussed  later. 

II.  THE  RISE  OF  MODERN 
PHYTOPLANKTON 


A.  Fossils  and  Phytogeny 

In  modem  oceans,  three  algal  groups 
dominate  primary  production  on  continen¬ 
tal  shelves:  the  diatoms,  dinoflagellates,  and 


FIGURE  6.  Proterozoic  and  Early  Cambrian  protistan  microfossils.  (A)  and  (B)  are  Neoproterozoic  leiosphcrid  acri- 
tarchs;  (C-F)  are  Early  Cambrian  acritarchs  (C-E)  and  a  prasinophyte  phycoma  (F;  Tasmaniles).  Scale  bar  =  40  microns. 
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coccolithophorids.  As  detailed  elsewhere 
in  this  volume,  fossils  clearly  suggest  that 
these  groups  all  rose  to  taxonomic  and 
ecological  prominence  only  during  the 
Mesozoic  Era  (Delwiche,  Chapter  10;  de 
Vargas  et  al.,  Chapter  12;  Kooistra  et  ai, 
Chapter  11,  this  volume).  Could  there 
have  been,  however,  an  earlier  "cryptic" 
evolutionary  history  for  these  groups? 
For  example,  might  nonmineralizing  stem 
group  diatoms  or  haptophytes  have  been 
ecologically  important  but  paleontologi¬ 
cally  uninterpretable  in  Paleozoic  oceans? 
Might  the  significance  of  Paleozoic  dinoflagel- 
lates  be  obscured  by  fossils  that  are  abundant 
and  diverse  but  lack  archeopyles? 

Several  reports  claim  microfossil  evidence 
for  Paleozoic  diatoms,  dinoflagellates,  and 
haptophytes,  but  such  fossils  are  rare  and 
subject  to  alternative  interpretation  as  con¬ 
taminants  (the  mineralized  skeletons)  or 
different  taxa  (organic  fossils).  Stratigraphic 
research  indicates  that  the  Paleozoic  silica 
cycle  differed  substantially  from  that  of 
the  Cretaceous  and  Tertiary  periods,  with 
sponges  and  radiolarians  dominating  biolog¬ 
ical  removal  of  silica  from  the  oceans  (Maliva 
et  al.  1989).  Similarly,  sediments  preserved  in 
obducted  slices  of  Paleozoic  seafloor  at  best 
contain  only  limited  evidence  for  pelagic  car- 
bonafe  deposition.  Such  observations  cannot 
eliminate  the  possibility  that  rare  diatoms, 
coccolithophorids,  or  calcareous  dinoflag¬ 
ellates  lived  in  Paleozoic  oceans,  but  they 
clearly  indicate  that  these  groups  did  not 
perform  the  biogeochemical  roles  they  have 
played  since  the  Mesozoic  Era. 

Complementing  this,  molecular  clock 
estimates  for  diatom  and  coccolithophorid 
diversification,  calibrated  by  well-preserved 
fossils,  suggest  that  these  groups  have  no 
long  Paleozoic  "prehistory"  (de  Vargas  et  al, 
Chapter  12;  Kooistra  et  al.,  Chapter  11,  this 
volume).  On  the  other  hand,  some  molec¬ 
ular  clock  analyses  suggest  divergence 
times  for  the  plastids  in  photosynthetic 
chromalveolates  well  back  into  the  Protero¬ 
zoic  (Douzery  et  al.  2004;  Yoon  et  al.  2004). 
If  these  estimates  are  even  broadly  cor¬ 


rect,  they  must  be  accommodated  in  one  of 
two  ways.  Either  the  molecular  clocks  date 
divergence  within  a  closely  related  group  of 
unicellular  red  algae  that  subsequently  and 
individually  were  incorporated  as  plastids 
in  chromalveolate  algae,  or  photosynthetic 
chromalveolates  emerged  from  a  single 
Proterozoic  endosymbiosis  (Cavalier-Smith 
1999)  but  remained  ecologically  unimpor¬ 
tant  or  paleontologically  unrecognizable  until 
much  later. 

B.  Biomarkers  and  the  Rise  of 
Modern  Phytoplankton 

The  rise  to  ecological  prominence  of  the 
three  chlorophyll  c-containing  eukaryotic 
plankton  lineages  left  several  imprints  in 
the  molecular  fossil  record,  especially  in  the 
distributions  of  steranes  with  different  side- 
chain  alkylation  patterns.  A  secular  increase 
in  the  ratio  of  C^^  to  C^,  steranes  (24-methyl- 
cholestanes  versus  24-ethylcholestanes), 
first  noted  by  Grantham  and  Wakefield 
(1988),  has  been  attributed  to  increasing 
production  by  chlorophyll  c  algae,  which 
dominate  C^^  sterane  input,  relative  to  green 
algae,  which  synthesize  primarily  C,,  ster¬ 
oids  (Volkman  2003).  An  updated  plot  of  the 
C^g/Cj,  sterane  ratio  versus  geological  age, 
based  on  averages  from  123  petroleum  sys¬ 
tems  worldwide,  is  shown  in  Figure  3  along 
with  data  for  diatom  diversity.  The  C^/C^ 
sterane  ratio  remains  below  0.4  in  the  Neo- 
proterozoic  and,  with  one  exception,  below 
0.7  through  the  Paleozoic.  Correspond¬ 
ing  to  the  diversification  of  diatoms  in  the 
later  half  of  the  Mesozoic,  there  is  a  rise  in 
the  Cjg/C^,  sterane  ratio  to  values  as  high 
as  1.8,  followed  by  an  apparent  drop  in  the 
Paleocene  and  Eocene.  Values  climb  again  in 
the  Miocene,  accompanying  a  second  rise  in 
the  numbers  of  diatom  genera  and  species. 

Other  biomarkers  show  marked  increases 
in  abundance  in  the  Cretaceous  that  also 
likely  reflect  diatom  radiation.  These  include 
24-norcholestanes  (Holba  et  al.  1998a,  b), 
the  so-called  highly  branched  isoprenoids 
(HBls;  Sinninghe  Damste  et  al.  1999a,  b;  Belt 
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et  al.  2000;  Allard  et  al.  2001),  long-chain 
diols,  and  mid-chain  hydroxyl  methyl- 
alkanoates  (Sinninghe  Damste  et  al.  2003). 
The  secular  increase  in  24-norcholestane 
abundance  (Moldowan  et  al.  1991;  Holba 
et  al.  1998a,  b)  was  observed  and  linked  to 
the  diatom  radiation  well  before  a  precur¬ 
sor  sterol  was  recognized  in  a  culture  of  the 
centric  diatom  Thalassiosira  aff.  antarctica 
(Rampen  et  al.  2005).  In  fact,  in  the  study 
of  Rampen  et  al.  (2005)  only  one,  that  is 
T.  ajf.  antarctica,  of  100  different  diatom  taxa 
was  found  to  produce  24-norcholesta-5, 
22-dien-3P-ol. 

The  detail  of  novel  sterol  production  by 
diatoms  provides  an  interesting  window 
into  the  connections  among  biomarkers, 
taxonomy,  and  the  physiological  roles  of 
lipids.  In  a  recent  study  of  diatom  sterols, 
Suzuki  et  al.  (2005)  reported  that  environ¬ 
mental  samples  oh  diatoms  collected  from 
the  North  Pacific  Ocean  and  the  Bering  Sea 
contained  24-norsterols.  Further,  samples 
of  the  diatom  Coscinodiscus  marginatus,  ini¬ 
tially  devoid  of  24-norcholesterol,  contained 
significant  amounts  of  this  and  the  related 
steroid  27-nor-24-methylcholesta-5,22-dien- 
3(3-ol  after  storage  at  3°C  for  30  days.  These 
authors  attributed  the  latter  change  to  bac¬ 
terial  biodegradation.  Due  to  the  ubiquity 
of  24-norsteranes  in  Mesozoic  sediments 
(Holba  et  al.  1998a,  b),  and  no  other  evidence 
for  selective  side-chain  biodegradation  of 
sterols  (biodegrading  bacteria  are  unlikely 
to  select  for  removal  of  and  of 
the  apparent  precursor  24-methylcholesta- 
5,22-dien-3P-ol  and  leave  other  sterols  un¬ 
touched),  it  seems  far  more  likely  that  there 
is  a  direct,  as  opposed  to  diagenetic,  source 
for  the  24-norsteroids  in  sediments.  The 
rarity  of  24-norsterols  in  cultured  diatoms 
(Rampen  et  al.  2005)  more  likely  reflects 
the  fact  that  sterol  biosynthesis  responds  to 
physiological  conditions  and  that  laboratory 
culture  conditions  have  as  yet  not  mimicked 
the  natural  conditions,  such  as  low  tempera¬ 
ture  and,  perhaps,  low  light  under  which 
24-norsterols  are  produced  by  some  dia¬ 
toms.  Ihe  enigma  surrounding  the  origins  of 


24-norsteroids,  and  the  ultimate  detection  of 
24-norcholesta-5,22-dien-3p-ol  by  Rampen  et 
al.  (2005)  in  a  cold  water  diatom  species,  pro¬ 
vide  a  timely  reminder  that  biomarkers  not 
only  reflect  the  presence  of  particular  algal 
taxa  but  also  reflect  the  environmental  con¬ 
ditions  under  which  they  thrive.  A  corollary 
to  this  is  that  cultured  organisms  might  not 
always  produce  the  same  assemblage  of  lip¬ 
ids  as  their  counterparts  growing  under  natu¬ 
ral  conditions. 

The  other  major  class  of  diatom-specific 
biomarkers  is  the  HBI.  So  far  as  is  currently 
known,  the  occurrence  of  HBI  is  confined 
to  four  genera,  namely  Navicula,  Haslea, 
and  Pleurosigma  within  the  pennates,  and 
Rhizosolenia  among  the  centrics  (Volkman 
et  al.  1994;  Belt  et  al.  2000;  Sinninghe  Damste 
et  al.  2004).  Both  molecular  phylogeny  and 
fossils  indicate  that  centric  diatoms  predate 
pennates  (Kooistra  et  al.  2006).  Therefore, 
the  genus  Rhizosolenia  is  considered  the 
likely  source  of  the  first  recorded  fossil  HBI 
at  about  91  million  years  ago  (Ma)  — which 
predates  recorded  fossil  tests  of  Rhizosole- 
nid  diatoms  by  about  20  million  years  (Sin¬ 
ninghe  Damste  et  al.  2004).  This  discrepancy 
in  timing  could  reflect  incomplete  paleon¬ 
tological  sampling,  which  systematically 
underestimates  first  appearances,  HBI  syn¬ 
thesis  by  a  morphologically  distinct  stem 
group  relative  of  the  rhizosolenids,  or  both. 
Biosynthetic  pathways  are  such  a  funda¬ 
mental  characteristic  of  organisms  that  they 
might  be  detectable  through  chemical  fos¬ 
sils  before  the  first  classical  fossils  of  a  clade 
are  ever  recognizable. 

Dinosteroid  biomarkers,  derived  from  the 
4-methylsterols  of  dinoflagellates  (Robinson 
et  al.  1984),  show  an  analogous  pattern  of 
secular  increase  in  the  Mesozoic,  in  accord 
with  microfossil  evidence  for  later  Trias- 
sic  dinoflagellate  radiation.  As  in  the  case 
of  diatom  HBI,  however,  several  reported 
occurrences  of  dinosteranes  predate  fossil 
cysts,  in  some  cases  by  hundreds  of  millions 
of  years  (Summons  et  al.  1992;  Moldowan 
and  Talyzina  1998;  Talyzina  et  al.  2000). 
These  deserve  close  attention  as  they  may 
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establish  a  genuine  pre-Mesozoic  history  of 
dinoflagellates,  as  predicted  by  molecular 
phylogenies  and  clocks.  Some  aspects  of  this 
putative  history  are  considered  further  later. 
In  Figure  4,  data  derived  from  the  “Oils" 
database  show  the  pattern  of  secular  varia¬ 
tion  in  triaromatic  dinosteroid  abundances, 
along  wTth  a  recent  compilation  of  dinoflag- 
ellate  cyst  diversity  (Katz  et  al.  2004). 

The  third  group  of  modem  plankton,  the 
haptophytes,  produces  distinctive  lipids  in 
the  form  of  long-chain  (n-C^^  to  unsatu¬ 
rated  ketones  known  as  alkenones  (Volkman 
et  al.  1980;  Marlowe  et  al.  1984).  These  can  be 
abundant  and  easily  recognized  in  Recent 
and  Cenozoic  sediments  that  have  not  expe¬ 
rienced  extensive  diagenesis,  and  they  form 
the  basis  of  a  widely  used  paleotempera- 
ture  proxy  (Brassell  et  al.  1986).  Their  dis¬ 
tinctiveness,  however,  lies  in  the  carbonyl 
functionality  and  one  to  four  unsaturations, 
which  are  easily  reduced  and  inherently 
unstable  over  geological  time  scales  (Prahl 
et  al.  1989).  Thus,  the  oldest  reported  detec¬ 
tion  is  in  Cretaceous  sediments  (Farrimond 
et  al.  1986),  and  we  would  not  expect  to  be 
able  to  recognize  them  in  Paleozoic  or  older 
rocks. 

The  molecular  and  morphological 
records  of  eukaryotic  predominance  in  shelf 
primary  production  are  mirrored  by  indica¬ 
tions  of  relatively  low  cyanobacteria!  contri¬ 
butions.  Some  cyanobacteria  are  known  to 
biosynthesize  BHPs  and  analogues  with  an 
extra  methyl  group  attached  to  the  2  posi¬ 
tion  of  the  A  ring  (2-MeBlIP);  the  hydro¬ 
carbon  cores  of  these  molecules  provide  a 
potentially  useful  tracer  for  cyanobacterial 
input  to  sedimentary  organic  matter  (Sum¬ 
mons  et  al.  1999).  Apart  from  a  few  notable 
exceptions,  values  of  the  2-methylhopane 
(2-MeHt)  index  fraction  of  hopanoids 
(methylated  at  the  2  position  relative  to  their 
desmethyl  counterparts)  tend  higher  in  Pro¬ 
terozoic  samples  than  they  are  in  Paleozoic 
and,  especially,  Jurassic  and  younger  oils 
(Figure  5A).  This  is  especially  true  of  sam¬ 
ples  from  shales.  (Examination  of  the  Phan- 
erozoic  record  [Figure  5B]  shows  that  higher 


2-MeHl  values  are  found  in  oils  from  car¬ 
bonate  lithologies,  formed  predominantly 
at  low  paleolatitudes  [Figure  5C].)  The 
exceptions  are  associated  with  widespread 
anoxia  in  the  oceans  (Figure  5B).  Kuypers  et 
al.  (2004a,  b)  and  Dumitrescu  and  Brassell 
(2005)  studied  biomarkers  associated  with 
Cretaceous  oceanic  anoxic  events  (OAEs) 
and  found  that  the  relative  abundances  of 
2-methylhopanoids,  as  measured  by  the  2- 
MeHI,  were  distinctively  enhanced,  along 
with  nitrogen  (N),  isotopic  evidence  for 
cyanobacterial  primary  productivity  (Kuy¬ 
pers  et  al.  2004b).  Mass  extinction  at  the 
Permian-Triassic  boundary  is  also  associ¬ 
ated  with  widespread  anoxia  in  shallow 
oceans  (Wignall  and  Twitchett  2002).  In  Fig¬ 
ure  5,  the  two  data  points  for  the  Permian 
Triassic  transition  represent  unpublished 
data  from  the  Perth  Basin,  Australia,  and  the 
boundary  stratotype  in  Meishan,  China;  in 
these  sections,  high  2-MelTl  correlate  with 
independent  molecular,  iron  speciation, 
and  sulfur  isofopic  evidence  for  intense 
euxinia  (Grice  et  al.  2005).  The  highest  2- 
MeFll  value  (0.29)  recorded  in  the  GeoMark 
Oils  database  comes  from  the  Larapin- 
tine  Petroleum  System,  Australia,  which 
includes  oils  from  the  Late  Devonian  reef 
complex  of  the  Canning  Basin  (Edwards  et 
al.  1997)  sourced  from  black  shales  depos¬ 
ited  near  to  Frasnian-Famennian  boundary, 
another  event  characterized  by  geological 
and  geochemical  evidence  for  pervasive 
euxinia  (e.g..  Bond  et  al.  2004).  The  samples 
from  the  Cretaceous  OAEs,  Permian-Trias¬ 
sic  boundary,  and  Frasnian-Fammenian 
shale  all  contain  abundant  isorenieratane 
and  aryl  isoprenoids  derived  from  the 
brown  pigmented  strains  of  the  green  sul¬ 
fur  bacteria  (Chlorobiaceae),  considered 
diagnostic  for  photic  zone  euxinia  (e.g., 
Summons  and  Powell  1987;  Koopmans  et 
al.  1996;  Kuypers  et  al.  2004a;  Grice  et  al. 
2005;  van  Breugel  et  al.  2005).  Kuypers  et  al. 
(2004b)  hypothesize  that  the  N  cycle  was 
compromised  while  euxinic  conditions 
prevailed  during  the  Cretaceous  OAEs, 
creating  an  unusual  opportunity  for  the 


283 


146 


8.  THE  GEOLOGICAL  SUCCESSION  OF  PRIMARY  PRODUCERS  IN  THE  OCEANS 


proliferation  of  N-fixing  cyanobacteria. 
The  disparate  occurrences  described  pre¬ 
viously  suggest  a  more  general  correlation 
between  high  2-MeHl  and  photic  zone 
euxinia,  a  topic  we  return  to  in  our  discus¬ 
sion  of  Proterozoic  primary  production. 

Although  cyanobacteria  appear  to  have 
been  minor  contributors  to  primary  produc¬ 
tion  on  most  Mesozoic  and  Conozoic  conti¬ 
nental  shelves,  they  remain  the  dominant 
phytoplankton  in  open-ocean,  oligotrophic 
environments  today.  Whether  this  is  a 
recent  or  long-standing  situation  is  difficult 
to  discern  given  the  paucity  of  the  deep- 
sea  sedimentary  records  and  the  absence  of 
2Me-BHP  in  cyanobacterial  picoplankton 
(Summons,  unpublished  data). 

C.  Summary  of  the  Rise  of  Modern 
Phytoplankton 

Fossils,  molecular  biomarkers,  molecular 
clocks  for  individual  clades,  and  the  sedi¬ 
mentary  silica  record  all  tell  a  consistent 
story;  the  modern  phytoplankton  has  Mes¬ 
ozoic  roots.  How  we  interpret  this  transfor¬ 
mation  depends  in  no  small  part  on  what 
we  think  came  before. 

III.  PALEOZOIC  PRIMARY 
PRODUCTION 


A.  Microfossils 

Microfossils  of  presumptive  eukaryo¬ 
tic  phytoplankton  are  both  abundant  and 
diverse  in  Paleozoic  marine  rocks  (Figure 
6C-F).  A  number  of  forms,  including  Tas- 
manites,  Pterospermella,  and  Cymatiosphaera, 
have  morphologies  and  ultrastructures  that 
relate  them  to  prasinophyte  phycomata 
(Tappan  1980).  Indeed,  in  well-studied 
microfossil  assemblages  from  Lower  Cam¬ 
brian  shales,  at  least  20%  of  described 
morphotaxa  and  more  than  half  of  all  indi¬ 
vidual  fossils  are  likely  prasinophytes  (e.g., 
Volkova  et  al.  1983;  Knoll  and  Swett  1987; 
Moczydlowska  1991).  Others,  with  regu¬ 
larly  distributed  processes,  tantalizingly 


resemble  dinocysts,  but  lack  archeopyles. 
Still  other  acritarchs  (the  group  name  given 
to  closed,  organic -walled  microfossils  of 
uncertain  systematic  relationships)  (Evitt 
1963)  do  not  closely  resemble  known  cysts 
of  modem  phytoplankton.  Collectively, 
these  microfossils  show  evidence  of  marked 
Cambrian  and  Ordovician  radiations  that 
parallel  the  two-stage  diversification  of 
marine  animals  (Knoll  1989).  For  reasons 
that  remain  obscure,  acritarch  diversity 
drops  near  the  end  of  the  Devonian  and 
remains  low  for  the  remainder  of  the  Paleo¬ 
zoic  Era  (Molyneux  et  al.  1996). 

Moldowan  and  Talyzina  (1998)  innova- 
tively  attempted  to  break  the  phylogenetic 
impasse  regarding  Cambrian  microfos¬ 
sils.  Extracts  from  fossiliferous  clays  of  the 
Lower  Cambrian  Llikati  Formation,  Esto¬ 
nia,  contain  low  abundances  of  dinoster- 
ane  and  4a-methyl-24-ethylcholestane,  both 
known  to  originate  from  the  sterols  of 
dinoflagellates  (Robinson  el  al.  1984;  Sum¬ 
mons  et  al.  1987).  Moldowan  and  Talyzina 
(1998)  divided  the  microfossil  populations 
in  a  Liikati  sample  into  three  groups — tas- 
manitids  (see  Figure  6F;  interpreted  as  pra¬ 
sinophyte  phycomata),  a  low  fluorescence 
group  dominated  by  leiosphaerid  acritarchs 
(also  possible  phycomata  of  prasinophytes 
like  the  extant  Halosphaera),  and  a  high 
fluorescence  fraction  containing  abundant 
process-bearing  acritarchs  (e.g.,  see  Figure 
6D) — and  analyzed  the  sterane  content  of 
these  subassemblages.  The  tasmanitid  and 
low  fluorescence  fractions  contained  rela¬ 
tively  abundant  C^,  steranes  but  little  or  no 
dinoflagellate  lipid.  In  contrast,  the  high  flu¬ 
orescence  fraction  contained  relatively  high 
abundances  of  dinosterane,  suggesting  that 
the  dominant,  process-bearing  acritarchs  are 
dinocysts  sans  archeopyles .  It  is  not  clear  that 
sterols  play  a  structural  role  in  cyst  walls, 
making  selective  adsorption  a  real  possi¬ 
bility.  Moreover,  whereas  dinosterane  and 
4a-methyl-24-ethylcholestane  abundances 
in  these  samples  are  relatively  high,  their 
concentrations  are  absolutely  low.  Thus, 
the  specific  attribution  of  acritarch  taxa  to 
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the  dinoflagellates  remains  speculative. 
Nonetheless,  these  analyses  do  clearly  sug¬ 
gest  that  dinoflagellates  were  present  in 
coastal  Cambrian  oceans  and  may  have  left 
a  morphological  as  well  as  biogeochemical 
record.  Given  the  low  abundances  of  dino- 
flagellate  biomarkers,  it  is  possible  that  the 
constituent  dinoflagellates  were  largely  het- 
erotrophs,  not  primary  producers. 

B.  Paleozoic  Molecular  Biomarkers 

The  molecular  fossil  record  prior  to  the 
rise  of  the  chlorophyll  c  lineages  broadly 
corroborates  the  microfossil  evidence  for 
the  occurrence  and  potential  ecological 
importance  of  other  eukaryotic  phytoplank¬ 
ton  in  the  Paleozoic.  In  particular,  the  high 
abundance  of  steranes  relative  to  and 


Cjg  homologues  suggests  a  greater  role  for 
green  algae  in  marine  primary  production 
at  this  time.  This  signal  is  observed  in  glo¬ 
bally  distributed  rocks  and  petroleum  sys¬ 
tems  from  the  latest  Neoproterozoic  into  the 
Paleozoic  and  wanes  in  the  later  Paleozoic, 
although  the  depositional  bias  for  much 
of  this  time  is  toward  low  paleolatitudes 
(Figure  7).  In  a  study  of  tasmanite  oil  shales 
from  different  locations  in  Tasmania,  Revill 
et  al.  (1994)  found  that  and  steranes 
were  present  in  roughly  equal  abundance 
and  dominated  over  All  the  samples 
were  shales  with  a  high  total  organic  car¬ 
bon  content,  with  the  visible  organic  matter 
primarily  comprising  Tasmanites  punctatus 
microfossils.  These  early  Permian  deposits, 
which  were  geographically  localized,  con¬ 
tained  abundant  dropstones  and  evidence 


^29 


t^tGURE  7.  Dominance  of  C,.,  steranes  in  late  Neoproterozolc/Early  Cambrian  petroleum  systems.  Paleogeogra- 
phy  shows  concentration  of  depositional  sy.stems  to  low  paleolatitudes  during  this  time.  Continental  reconstruc¬ 
tion  redrawn  after  C.  Scotese,  available  at  www.scotese.com. 
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of  low-temperature  minerals,  which  were 
clearly  glacial  in  origin.  Thus,  the  Tasmanites 
punciatus  may  have  occupied  an  ecological 
niche  similar  to  that  occupied  by  modem  sea- 
ice  diatom  communities  (Revill  et  al.  1994). 

Dinosteranes  are  generally  below  detec¬ 
tion  in  the  Paleozoic  marine  sediments  and 
oils  that  have  been  examined  and  reported 
to  date.  In  contrast,  triaromatic  dinosteroids 
have  been  found  in  significant  abundance 
in  several  lower  Paleozoic  sedimentary 
rocks  and  petroleum  samples.  This  speaks 
to  the  occurrence  of  either  stem  or  crown 
group  dinoflagellates  in  Paleozoic  oceans. 
An  additional  factor  in  observed  dinoster- 
ane  abundances  may  be  preservational  bias. 
Saturated  dinosteroids,  dinosteranes,  may 
only  be  preserved  under  strongly  reducing 
conditions.  A  wider  search  for  triaromatic 
dinosteroids,  and  authentication  of  the 
Paleozoic  petroleum  data  through  reanaly¬ 
sis  and  careful  checking  of  pedigrees,  may 
expose  a  more  extensive  pattern  of  occur¬ 
rence,  and  hence,  a  richer  early  history  for 
these  plankton  than  is  evident  from  the 
distribution  of  fossilized  cysts. 

C.  Paleozoic  Summary 

Biomarker  data  for  oils  and  some  sedi¬ 
ments  suggest  that  dinoflagellates  existed 
in  Paleozoic  oceans,  but  with  few  excep¬ 
tions,  lipids  thought  to  be  sourced  by  dino¬ 
flagellates  occur  in  low  abundances,  raising 
the  question  whether  Paleozoic  dinoflag- 
ellates  functioned  to  any  great  extent  as 
primary  producers.  The  same  is  true  of 
possible  stem-group  heterokonts.  Thus, 
although  Chi  a-FC  phytoplankton  may  well 
have  existed  in  Paleozoic  oceans,  they  do 
not  appear  to  have  played  anything  like  the 
ecological  role  they  have  assumed  since  the 
Mesozoic  Era  began.  In  contrast,  microfossil 
and  biomarker  molecules  both  suggest  that 
green  algae  played  a  greater  role  in  marine 
primary  production  than  they  have  in  the 
past  100  million  years,  and  biomarkers  also 
suggest  a  significant  role  for  cyanobacterial 
production  on  continental  shelves.  Macro¬ 


fossils,  predominantly  of  calcareous  skel¬ 
etons,  further  indicate  that  red  and  green 
algae  were  ecologically  important  in  the 
shallow  shelf  benthos  (Wray  1977). 

IV.  PROTEROZOIC  PRIMARY 
PRODUCTION 


Fossils  (whether  morphological  or  molec¬ 
ular)  are  less  abundant  in  Proterozoic  rocks 
than  they  are  in  Phanerozoic  samples,  and 
Proterozoic  sedimentary  rocks,  themselves, 
are  less  abundantly  preserved  than  their 
younger  counterparts.  Nonetheless,  fossils 
have  been  reported  from  hundreds  of  Pro¬ 
terozoic  localities  (Mendelson  and  Schopf 
1 992),  allowing  us  to  recognize  at  least  broad 
patterns  of  stratigraphic  and  paleoenviron- 
mental  distribution.  Indeed,  Proterozoic 
micropaleontology  has  developed  to  the 
point  where  it  has  become  predictive,  in  the 
sense  that  knowledge  of  age  and  environ¬ 
mental  setting  permits  reasonable  predic¬ 
tion  about  the  fossil  content  of  a  given  rock 
sample  (e.g.,  BCnoll  et  al.  2006). 

A.  Prokaryotic  Fossils 

By  the  earliest  Proterozoic  Eon,  cyano¬ 
bacteria  must  have  been  important  con¬ 
tributors  to  primary  production — there 
is  no  other  plausible  source  for  the  that 
began  to  accumulate  in  the  atmosphere  and 
surface  oceans  2.45-2.32  billion  years  ago 
(Ga).  Consistent  with  this  observation,  it 
has  been  appreciated  since  the  early  days 
of  Precambrian  paleontology  that  cyano- 
bacteria-like  microfossils  are  abundant  and 
widespread  constituents  of  Proterozoic  fos¬ 
sil  assemblages  (Figure  8;  Schopf  1968).  Not 
all  cyanobacteria  have  diagnostic  morpholo¬ 
gies,  but  some  do  and  others  are  likely  can¬ 
didates  for  attribution  given  knowledge  of 
faphonomy  (processes  of  preservation)  and 
depositional  environments  represented  in 
the  record.  By  mid-Proterozoic  times,  if  not 
earlier,  all  major  clades  of  cyanobacteria 
existed  in  marine  and  near-shore  terrestrial 
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FIGURE  8.  Cyanobacteria  in  Proterozoic  sedimentary  rocks.  (A)  700-800-million-year-old  endolithic  pleur- 
capsalean  fossil.  (B)  1500-million-year-old  mat  building  cyanobacterium  closely  related  to  modern  Entophysalis. 
(C)  1500-million-year-old  short  trichome.  (D)  Spirulina-like  fossil  in  latest  Proterozoic  (<600  Ma)  phosphorite. 
Scale  bar  =  15  microns  in  (A-C),  and  =  25  microns  in  (D). 


environments,  including  those  that  differen¬ 
tiate  akinetes  and  heterocysts  (Tomitani  et  al. 
2006).  The  best-characterized  Proterozoic 
cyan-obacteria  come  from  early  diagenetic 
chert  nodules  in  carbonate  successions  (e.g., 
Schopf  1968;  Zhang  1981;  Knoll  et  al.  1991; 
Sergeev  et  al.  1995,  2002;  Golubic  and  Seong- 
Joo  1999).  These  fossils  are  largely  benthic 
and  largely  coastal  marine.  Stromatolites, 
however,  indicate  a  much  wider  distribution 
of  benthic  cyanobacteria  in  the  photic  zone. 
(A  role  for  cyanobacteria  or  of  organisms  in 
general  is  difficult  to  establish  in  the  precipi¬ 
tated  stromatolites  found  in  Earth's  oldest 
well-preserved  sedimentary  successions; 
however,  the  likelihood  that  cyanobacteria 
were  major  architects  of  Proterozoic  stroma¬ 
tolites  that  accreted  primarily  by  trapping 
and  binding  is  high)  (Grotzinger  and  Knoll 
1999).  Microfossils  are  less  useful  for  evalu¬ 
ating  the  contributions  of  cyanobacteria  to 
the  phytoplankton  of  Proterozoic  oceans 
because  many  were  small,  nondescript,  and 
likely  to  settle  on  the  seafloor  in  places  where 
interpretable  preservation  was  improbable. 
Given  the  distribution  of  planktonic  clades 


on  a  phylogenetic  tree  calibrated  by  well- 
documented  fossils,  however,  it  is  likely  that 
cyanobacteria  were  important  constituents 
of  the  phytoplankton  in  Proterozoic  oceans 
(Sanchez-Baracaldo  et  al.  2005;  Tomitani  et  al. 
2006;  see  later). 

B.  Eukaryotic  Fossils 

Two  problems  shadow  attempts  to  under¬ 
stand  the  Proterozoic  history  of  photosyn¬ 
thetic  eukaryotes.  Given  the  polyphyletic 
evolution  of  at  least  simple  unicellular  and 
multicellular  characters,  convergence  com¬ 
plicates  interpretation  of  many  Proterozoic 
protistan  fossils.  In  addition,  given  the  oft- 
observed  reality  that  stem  group  organisms 
display  only  a  subset  of  the  characters  that 
collectively  identify  crown  group  members 
of  clades,  early  fossils  may  challenge  finer 
scale  systematic  attribution,  even  though 
they  may  be  unambiguously  eukaryotic. 

Despite  these  problems,  a  small  number 
of  fossil  populations  provide  calibration 
points  for  eukaryotic  phytogenies.  Bangi- 
omorpha  pubescens  (Butterfield  2000)  is  a  large 
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population  of  multicellular  microfossils 
found  in  tidal  flat  deposits  of  the  ca.  1200  Ma 
Hunting  Formation,  Arctic  Canada.  These 
erect  filaments,  preserved  via  rapid  burial 
by  carbonate  mud  and  subsequent  silicifi- 
cation,  display  patterns  of  thallus  organiza¬ 
tion,  cell  division,  and  cell  differentiation 
that  ally  them  to  the  bangiophyte  red  algae. 
Complementing  this,  a  moderate  diversity 
of  cellularly  preserved  florideophyte  red 
algal  thalli  occurs  in  <600  Ma  phosphorites 
of  the  Ediacaran  Uoushantuo  Formation, 
China  (Xiao  et  al.  2004).  Shifting  to  another 
branch  of  the  eukaryotic  tree,  several  taxa 
of  vase  shaped  microfossils  in  the  ca.  750 
Kwagunt  Formation,  Grand  Canyon,  Ari¬ 
zona,  can  be  related  to  lobose  testate  amoe¬ 
bae,  placing  a  minimum  constraint  on  the 
timing  of  amoebozoan  divergence  (Porter 
et  al.  2003). 

Accepting  the  presence  of  red  algae  by 
]200Ma,  one  might  expect  to  observe  green 
algal  fossils  in  younger  Proterozoic  rocks. 
Several  candidate  taxa  have  been  described, 
of  which  Proterodadus,  a  branching  coeno- 
cytic  thallus  organized  much  like  living 
Cladophora,  is  most  compelling  (Butterfield 
et  al.  1994).  Palaeovaucheria  davata,  described 
from  >1005  ±4Ma  shales  in  Siberia  (Her¬ 
man  1990),  as  well  as  ca.  750-800  Ma  shales 
from  Spitsbergen  (Butterfield  2004),  has  a 
branching  filamentous  morphology  and 
pattern  of  reproductive  cell  differentiation 
very  similar  to  that  of  the  extant  xantho- 
phyte  alga  Vaucheria.  Kooistra  et  al.  (Chapter 
11,  this  volume)  speculate  that  this  similar¬ 
ity  arose  via  convergence  in  a  green  algal 
clade;  either  interpretation  places  the  origin 
of  green  algae  earlier  than  1000  Ma. 

Fossils  show  that  eukaryotic  photoau¬ 
totrophs  were  present  in  the  benthos  no 
later  than  the  Mesoproterozoic  Era  (1600- 
lOOOMa),  but  what  about  the  phytoplank¬ 
ton?  Unicellular  taxa  occur  in  all  three 
divisions  of  the  Plantae,  making  it  likely 
that  such  cells  existed  by  the  time  that  Ban- 
giomorpha  evolved.  Of  these,  however,  only 
the  phycomate  prasinophytes  are  likely  to 
have  left  a  tractable  fossil  record  in  marine 


sedimentary  rocks.  As  noted  previously, 
the  three  major  types  of  ornamented  phy- 
coma  known  from  living  prasinophytes 
have  fossil  records  that  extend  backward 
to  the  Early  Cambrian,  but  there  is  lit¬ 
tle  evidence  of  earlier  origin.  In  contrast, 
extant  Halosphaera  develop  smoothly  sphe¬ 
roidal  phycomata  that  could  easily  be  rep¬ 
resented  among  leiosphaerid  acritarchs  in 
Proterozoic  rocks  (see  Figure  6A  and  B;  Tap- 
pan  1980).  Ultrastructural  and  microchemi¬ 
cal  studies  (e.g.,  Javaux  et  al.  2004;  Marshall 
et  al.  2005)  provide  our  best  opportunity  to 
test  this  hypothesis. 

Chromalveolates  may  be  recorded 
in  a  very  different  way.  In  1986,  Allison 
and  Hilgert  reported  small  (7M0  pm  in 
maximum  dimension),  apparently  siliceous 
ovoid  scales  in  cherts  of  the  Tindir  Group, 
northwestern  Canada,  now  judged  to  be 
>635  and  <710Ma  (Kaufman  et  al.  1992). 
The  scales  resemble  those  formed  by  liv¬ 
ing  prymnesiophytes  and  (at  a  smaller  size 
range)  chrysophytes,  likely  documenting 
early  diversification  somewhere  within  the 
chromalveolate  branch  of  the  eukaryotic 
tree. 

Fossils  of  any  kind  are  rare  in  rocks 
older  than  about  2000  million  years,  but 
unambiguous  fossils  of  eukaryotes  occur 
in  shales  as  old  as  1650-1 850  Ma  (Knoll 
et  al.  2006);  little  is  known  about  their 
systematic  relationships  or  physiology. 
Compilations  of  total  diversity  (e.g..  Knoll 
1994;  Vidal  and  Moczydlowska  1997), 
assemblage  diversity  (Knoll  et  al.  2006), 
and  morphospace  occupation  (Huntley 
et  al.  2006)  through  time  agree  that  a  mod¬ 
erate  diversity  of  eukaryotic  organisms 
existed  in  Mesoproterozoic  oceans.  By 
ca.  1200  Ma  if  not  earlier,  this  diversity 
included  photosynthetic  eukaryotes.  Diver¬ 
sity  appears  to  have  increased  modestly  in 
the  Neoproterozoic,  but  the  major  radia¬ 
tions  within  presorvable  seaweed  and 
phytoplankton  groups  took  place  only  at 
the  end  of  the  Proterozoic  Era  and  dur¬ 
ing  the  ensuing  Cambrian  and  Ordovician 
Periods  (Knoll  et  al.  2006). 
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C.  Proterozoic  Molecular  Biomarkers 

Rocks  containing  organic  matter  amena¬ 
ble  to  biomarker  analysis  grow  increasingly 
rare  as  we  sample  more  deeply  into  the  Pro¬ 
terozoic,  and  many  of  those  available  have 
undergone  extensive  heating  such  that  only 
the  most  recalcitrant  molecules  remain. 
Nevertheless,  a  molecular  fossil  record  of 
primary  production  is  emerging  for  this  long 
interval  of  Earth  history.  A  major  feature  of 
the  record  is  the  high  relative  abundance 
of  2-MeHl  in  organic-rich  distal  shales 
throughout  the  Proterozoic  (see  Figure  5), 
which,  in  conjunction  with  the  microfossil 
record  and  geochemical  evidence  for  oxic 
surface  waters  in  oceans,  provides  strong 
evidence  for  the  importance  of  cyanobacte- 
rial  production.  There  has  been  such  limited 
sampling  of  Paleo-  and  Mesoproterozoic 
sediments  that  it  has  not  been  possible  to 
examine  these  for  correlations  with  lithology, 
as  has  been  accomplished  for  the  Phanerozoic 
(see  Figure  5C). 

Steranes  have  been  reported  from  a  num¬ 
ber  of  Proterozoic  successions  (e.g..  Summons 
and  Walter  1990;  Hayes  et  al.  1992;  Dutkie- 
wicz  et  al.  2003)  and  are  generally  found  in 
low  abundance,  reflecting  at  least  in  part  the 
thermal  maturity  of  their  host  rocks.  These 
molecular  fossils  establish  the  presence  of 
eukaryotes  in  Proterozoic  oceans,  but  the 
scarcity  of  detailed  records  limits  the  infer¬ 
ences  that  can  be  drawn  concerning  eco¬ 
logical  role  or  taxonomic  affinities  (because 
group-distinctive  markers  are  generally 
below  detection  limits).  The  geologic  record 
of  steroid  biosynthesis  extends  into  the  Late 
Archean,  several  hundred  million  years 
before  the  first  recognized  protistan  fos¬ 
sils.  However,  there  continue  to  be  doubts 
about  the  syngeneity  of  these  steroids 
because  of  the  advanced  maturity  of  all  the 
sections  studied  so  far  and  because  of  the 
potential  for  the  bitumens  found  there  to 
have  migrated  from  younger  sequences  or 
to  be  contaminants  from  drill  and  handling 
(e.g..  Brocks  et  al.  2003a,  b).  In  contrast,  the 
Roper  and  McArthur  Basins  of  northern 


Australia  contain  rocks  of  low-to-moderate 
thermal  maturity,  more  consistent  with  the 
probability  of  finding  genuinely  syngenetic 
biomarkers.  Given  that  studies  of  the  Roper 
and  McArthur  Basin  sediments  and  oils 
consistently  show  the  presence  of  steroids 
(Summons  et  al.  1988a,  b;  Dutkiewicz  et  al. 
2003)  along  with  other  evidence  for  the  in 
situ  (Summons  et  al.  1994)  character  of  the 
bitumens,  there  seems  little  doubt  that  ster¬ 
oid  biosynthesis  operated  as  long  ago  as 
1640  Ma.  Preservation  is  a  major  limitation 
for  both  body  and  molecular  fossil  records 
at  this  point.  Nonetheless,  sterane  abun¬ 
dances  in  rocks  of  this  age  appear  to  be  low, 
independent  of  maturity  level,  and  do  not 
approach  Phanerozoic  abundances  until  the 
Neoproterozoic  Era.  Based  on  a  few  excep¬ 
tionally  well-preserved  deposits  of  organic 
material  in  Mesoproterozoic  shales,  there 
appear  to  have  been  times  and  places  where 
producer  communities  were  very  different 
from  those  that  characterize  later  periods. 
Brocks  etal.  (2005)  have  reported  biomarkers 
of  anaerobic,  sulfide-utilizing  phototrophs 
in  the  carbonate  facies  of  the  Barney  Creek 
Formation,  Australia,  suggesting  that  eux- 
inic  waters  extended  well  into  the  photic 
zone.  Molecular  markers  of  eukaryotes 
and  cyanobacteria  in  those  portions  of  the 
Barney  Creek  Formation  are  exceptionally 
scarce,  raising  the  possibility  that,  in  at  least 
some  environments  during  the  Proterozoic, 
production  by  anoxygenic  photoautotrophs 
may  have  been  quantitatively  important. 
In  fact,  the  scarcity  of  steroids  and  2- 
methylhopanoids  in  samples  with  most 
abundant  biomarkers  for  phototrophic  sul¬ 
fur  bacteria  is  also  consistent  with  the  highly 
euxinic  conditions  they  require.  The  extent 
to  which  this  scenario  reflects  global  versus 
local  conditions  awaits  further  elucidation, 
but  it  is  consistent  with  geochemical  prox¬ 
ies  for  oceanic  redox  conditions,  observed 
globally  (e.g.,  Logan  et  al.  1995;  Arnold  et  al. 
2003;  Shen  et  al.  2003;  Gellatly  and  Lyons 
2005). 

In  contrast  to  the  scarcity  of  suitable 
organic-rich  rocks  in  the  Paleoproterozoic 
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and  Mesoproterozoic  successions,  the  Neo- 
proterozoic  is  replete  with  well-character¬ 
ized  organic  matter  in  low  maturity  sections 
from  Australia,  North  America,  Oman  (e.g., 
Grantham  et  al.  1987)  and  eastern  Sibe¬ 
ria  (Summons  and  Powell  1992;  see  Sum¬ 
mons  and  Walter  1990;  Hayes  et  al.  1992,  for 
reviews).  Of  particular  note  are  the  oldest 
commercial  petroleum  accumulations  in 
Siberia  and  Oman.  These  late  Neoprotero- 
zoic  oils  display  striking  biomarker  patterns 
characterized  by  particularly  abundant 
steroidal  hydrocarbons.  Predominance  of 
Cj,  steranes  over  other  homologues  is  a  fea¬ 
ture  of  oils  from  the  South  Oman  Salt  Basin 
that  has  received  much  attention  since  it 
was  first  reported  by  Grantham  (1986). 
Examination  of  Neoproterozoic  petroleum 
samples  worldwide  suggests  that  this  is  a 
globally  significant  feature  (see  Figure  7) 
that  records  the  rise  of  green  algae  to  eco¬ 
logical  prominence.  Further,  samples  that 
show  the  strong  predominance  of  ster¬ 
anes  are  also  generally  characterized  by 
anomalously  light  carbon  isotopic  composi¬ 
tions,  in  the  range  of  -33  to  -37%o  PDB.  It 
is  likely  no  coincidence  that  the  oldest  com¬ 
mercial  petroleum  deposits  bear  the  promi¬ 
nent  signature  of  a  green  algal  contribution 
to  petroleum-prone  organic  matter  and  that 
some  green  algae  are  known  for  their  capac¬ 
ity  to  biosynthesize  decay-resistant  aliphatic 
biopolymers  in  their  cell  wall  (algaenans; 
Derenne  et  al.  1991,  1992;  Gelin  et  al.  1996, 
1997, 1999),  a  likely  source  of  acyclic  hydro¬ 
carbons  in  these  oils  (e.g..  Hold  et  al.  1999). 

D.  Summary  of  the  Proterozoic  Record 

Microfossil  and  biomarker  records  are 
consistent  in  showing  that  cyanobacte¬ 
ria  and  eukaryotic  microorganisms  were 
both  present  in  Proterozoic  oceans.  Fossils 
indicate  that  the  primary  endosymbiotic 
event  establishing  the  photosynthetic  Plan- 
tae  took  place  no  later  than  ca.  1200  Ma, 
in  broad  agreement  with  molecular  clock 
estimates  appropriately  ornamented  by 
error  estimates  (Hackett  et  al.,  Chapter  7, 


this  volume).  Thus,  eukaryotic  algae  con¬ 
tributed  to  primary  production  during 
at  least  the  last  600  million  years  of  the 
Proterozoic  Era.  Yet,  preserved  biomark¬ 
ers  are  dominated  by  cyanobacteria  and 
other  photosynthetic  bacteria,  suggesting 
that  eukaryotes  played  a  limited  quantita¬ 
tive  role  in  primary  production.  Increas¬ 
ing  amounts  of  steranes  appear  in  later 
Neoproterozoic  samples,  typified  by  the 
high  sterane-to-hopane  ratios  and  strong 
sterane  predominances  in  oils  from  the  South 
Oman  Salt  Basin  and  eastern  Siberia  (e.g., 
Grantham  1986;  Summons  and  Powell  1992); 
this  suggests  that  green  algae  began  to  play 
an  increasing  role  in  primary  production  by 
600-700 Ma.  The  timing  of  this  transition  is 
not  well  constrained  but,  in  Oman,  it  begins 
prior  to  the  Marinoan  glaciation  and  extends 
to  the  Neoproterozoic-Cambrian  boundary 
(Grosjean  et  al.  2005,  and  unpublished  data), 
falsifying  the  hypothesis  that  the  green  algal 
proliferation  was  a  response  to  the  Acra- 
man  impact  event  in  Australia  (McKirdy  et 
al.  2006).  In  short,  algae  may  have  emerged 
as  major  contributors  to  global  primary  pro¬ 
duction  only  during  the  late  Neoproterozoic 
to  Early  Paleozoic  interval  distinguished  by 
marked  increases  in  fossil  diversity. 

V.  ARCHEAN  OCEANS 

We  evaluate  the  Archean  geobiologi- 
cal  record  cautiously,  as  available  data  are 
sparse.  Sedimentary  rocks  are  limited  in 
volume,  especially  for  the  early  Archean, 
and  most  surviving  strata  have  been  altered 
by  at  least  moderate  metamorphism.  Thus, 
any  interpretation  must  be  provisional. 

The  expectation  from  both  phytogeny  and 
the  Proterozoic  biogeochemical  record  is  that 
prokaryotic  primary  producers  are  likely 
to  have  governed  early  marine  ecosystems. 
Cyanobacteria  have  the  ecological  advantage 
of  obtaining  electrons  from  ubiquitous 
water  molecules,  but  there  is  no  reason  to 
believe  that  cyanobacteria  were  the  pri¬ 
mordial  photoautotrophs  (see  Blankenship 
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et  al.,  Chapter  3,  this  volume).  Indeed,  the 
question  of  when  cyanobacteria,  with  their 
coupled  photosystems,  evolved  remains 
contentious.  In  an  early  ocean  dominated 
by  anoxygenic  photobacteria,  the  availabil¬ 
ity  of  electron  donors  (Fe++,  H^,  H^S)  would 
have  limited  primary  production  (Kharecha 
et  al.  2005). 

The  four  principal  lines  of  evidence  used 
to  contract  an  evolutionary  history  of  Prot¬ 
erozoic  oceans  apply  equally  to  the  Archean 
record:  microfossils,  biomarker  molecules, 
sedimentary  textures  that  record  microbe  / 
sediment  interactions  on  the  ancient  sea¬ 
floor  (e.g.,  stromatolites),  and  stable  isotopic 
signatures  (Knoll  2003b).  Few  microfossils 
have  been  reported  from  Archean  cherts  and 
shales.  Somewhat  poorly  preserved  fossils 
occur  in  latest  Archean  cherts  from  South 
Africa  (Lanier  1986;  Klein  et  al.  1987;  Alter- 
mann  and  Schopf  1995);  these  could  include 
cyanobacteria,  but  other  alternatives  can¬ 
not  be  rejected.  More  controversial  are  the 
nearly  3500  Ma  carbonaceous  microstruc¬ 
tures  interpreted  as  bacterial,  and  possibly 
cyanobacterial,  trichomes  by  Schopf  (1993). 
Recently,  not  only  their  systematic  interpre¬ 
tation  but  their  fundamental  interpretation 
as  biogenic  has  been  called  into  question 
(Brasier  et  al.  2002, 2005, 2006).  Debate  about 
these  structures  continues  (e.g.,  Schopf  et  al. 
2002a,  b,  in  response  to  Brasier  et  al.  2002), 
but  few  believe  that  these  structures,  what¬ 
ever  their  origin,  provide  phylogenetic  or 
physiological  insights  into  early  life. 

The  stromatolite  record  is  similar.  At 
least  40  occurrences  of  stromatolites  have 
been  reported  from  Archean  rocks  (Schopf 
2006) — not  a  lot  given  that  the  record  is 
1  billion  years  long.  Those  younger  than 
about  3000 Ma  include  structures  that  accreted 
by  the  trapping  and  binding  of  fine  parti¬ 
cles;  such  textures  are  more  or  less  uni¬ 
formly  associated  with  microbial  activity. 
Bedding  surfaces  on  siliciclastic  rocks  of 
comparable  age  similarly  include  textures 
attributable  to  microbial  mat  communities 
(Noffke  et  al.  2006).  Older  stromatolites  are 
largely  precipitated  structures  whose  bio- 


genicity  is  harder  to  establish.  Conoidal 
forms  in  ca.  3450  Ma  rocks  from  Western 
Australia  (Hofmann  et  al.  1999;  Allwood  et  al. 
2006)  and  "roll-up  structures"  (sediment 
sheets  that  were  ripped  up  and  rolled  into 
a  cylinder  by  currents,  suggesting  microbi- 
ally  mediated  cohesion  of  poorly  lithified 
laminae)  in  comparably  old  rocks  from 
South  Africa  (Tice  and  Lowe  2004)  may  well 
require  biological  participation  to  form,  but 
the  taxonomic  and  physiological  nature  of 
the  participants  remains  uncertain  (see  Tice 
and  Lowe  2006,  for  an  argument  that  anox¬ 
ygenic  photobacteria  fueled  Early  Archean 
mat  ecosystems). 

No  biogeochemicaUy  informative  organic 
molecules  are  known  from  Early  Archean 
rocks.  Late  Archean  biomarkers  have 
been  reported;  controversy  surrounding 
their  identification  and  interpretation  has 
two  distinct  aspects.  The  first  relates  to 
their  provenance  and  whether  or  not  all 
the  organic  matter  present  in  ancient  sedi¬ 
ments  is  coeval,  as  recognized  by  Brocks  et  al. 
(2003a,  b).  This  question  can  best  be  addres¬ 
sed  through  studies  of  cores  recently  drilled 
and  curated  under  controlled  conditions.  For 
example,  the  Agouron-Griqualand  Paleo- 
proterozoic  Drilling  Project  (AGPDP)  and 
the  NASA  Astrobiology  Institute  Drilling 
Project  (ABDP)  have  recovered  fresh  cores 
from  South  Africa  and  the  Pilbara  Craton  of 
Western  Australia,  respectively,  which  are 
being  studied  for  a  range  of  paleobiologic 
proxies,  including  analyses  of  preserved 
organic  matter.  One  aim  of  this  research  is 
to  control  or  eliminate  contamination  by 
hydrocarbons  from  younger  sediments; 
a  second  aim  is  to  test  for  relationships 
between  extractable  hydrocarbons  and  rock 
properties  that  could  not  exist  in  the  case  of 
contamination. 

'Ihe  second  aspect  of  the  controversy 
revolves  around  the  degree  to  which  bio¬ 
synthetic  pathways  may  have  evolved  over 
long  time  scales.  It  is  fair  to  state  that  there 
must  have  been  evolution  in  the  structure 
and  function  of  lipids  over  geological  time. 
However,  key  enzymes  in  the  biosynthetic 
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pathways  leading  to  sterols  (Summons 
et  al.  2006)  and  other  triterpenoids  in  extant 
organisms  are  highly  conserved.  The  known 
geological  record  of  molecular  fossils,  espe¬ 
cially  steranes  and  triterpanes,  is  notable 
for  the  limited  number  of  structural  motifs 
that  are  recorded.  With  a  few  exceptions, 
the  carbon  skeletons  are  the  same  as  those 
found  in  the  lipids  of  extant  organisms,  and 
no  demonstrably  extinct  structures  have 
been  reported.  Furthermore,  the  patterns 
of  occurrence  of  sterane  and  triterpane  iso¬ 
mers  are  rigid  over  billion-year  time  scales 
and  correlate  strongly  with  environments  of 
deposition,  suggesting  that  diagenetic  path¬ 
ways  connecting  functional  lipids  to  their 
fossil  biomarker  counterparts  are  also  con¬ 
served.  We  also  have  evidence,  through  the 
occurrence  of  rearranged  steranes  (diaster- 
anes)  and  unconventional  steroids  such  as 
the  2-alkyl  and  3-alkyl  steranes  (Summons 
and  Capon  1988, 1991;  van  Kaam-Peters  et  al. 
1998)  and  their  aromatic  counterparts  (Dahl 
et  al.  1995),  that  fossil  steranes  originated 
from  precursors  that  carried  a  3-hydroxyl 
group  and  unsaturation  in  the  tetracyclic 
ring  system,  as  extant  sterols  do.  Thus,  there 
is  no  evidence  for  major  changes  in  the 
known  record  of  chemical  fossils  that  could 
be  attributed  to  the  inception,  evolution, 
or  alternative  lipid  biosjmthetic  pathways 
to  the  24-alkylated  steroids  or  hopanoids 
(Kopp  et  al.  2005).  Accordingly,  if  biomark¬ 
ers  that  have  been  identified  are  confirmed 
to  be  indigenous  to  late  Archean  rocks,  this 
will  constitute  robust  evidence  for  the  pres¬ 
ence  of  algae  and  bacteria  early  in  Earth  his¬ 
tory.  The  fact  that  molecular  oxygen  is  an 
absolute  requirement  for  the  biosynthesis  of 
algal  sterols  also  implies  that  oxygenic  pho¬ 
tosynthesis  must  have  been  present  at  the 
time  (Summons  et  al.  2006). 

The  carbon  isotopic  abundances  of  Early 
Archean  carbonates  and  organic  matter 
are  comparable  to  those  of  younger  rocks, 
indicating  fractionation  like  that  impart¬ 
ed  by  Rubisco-based  autotrophy.  Indeed, 
C-isotopicsignaturesthatareconsistentwith 
carbon  fixation  by  Rubisco  extend  backward 


to  nearly  3800 Ma  metamorphosed  sediments 
from  southwestern  Greenland  (Rosing  and 
Frei  2004).  The  question  is  whether  these  sig¬ 
natures  require  such  an  interpretation.  Other 
biochemical  pathways  for  carbon  fixation 
exist  and  at  least  some  of  them  impart  iso¬ 
topic  signatures  that  are  equally  consistent 
with  Archean  data  (e.g..  Knoll  and  Canfield 
1998).  It  has  been  known  for  2  decades  that 
abiotic  syntheses  of  organic  matter,  like  that 
demonstrated  by  Miller  (1953),  fractionate 
C-isotopes  (Chang  et  al.  1982);  the  degree 
of  fractionation  appears  to  vary  widely  as  a 
function  of  inrtial  conditions.  More  recently, 
McCollom  and  Seewald  (2006)  have  shown 
that  Fischer-Tropsch-type  (FTT)  synthesis 
can  produce  organic  compounds  depleted 
in  '^C  relative  to  their  carbon  source  to  a 
degree  similar  to  that  associated  with  bio¬ 
logical  carbon  fixation.  In  these  experi¬ 
ments,  formic  acid  was  reacted  with  native 
iron  at  250°  C  and  325  bar,  and  a  series  of 
n-alkanes  were  produced  that  were  depleted 
by  ~36%o  relative  to  the  reactant  carbon. 
This  isotopic  discrimination  is  in  the  range 
observed  for  the  difference  in  6^^C  values  of 
coexisting  carbonate  minerals  and  organic 
matter  in  some  Archean  deposits.  This  find¬ 
ing  emphasizes  the  importance  of  under¬ 
standing  the  depositional  context  (e.g., 
sedimentary  versus  hydrothermal)  of  this 
very  ancient  carbonaceous  matter  when 
assessing  its  biogenicity. 

We  conclude  that  the  origin  of  life  predates 
the  known  record  of  preserved  sedimentary 
rocks,  but  the  nature  of  that  life — and,  in 
particular,  the  nature  of  primary  produc¬ 
ers  in  the  oceans — remains  uncertain.  All 
known  geobiological  records  from  Archean 
rocks  are  consistent  with  an  early  evolution 
of  cyanobacteria,  but  few  if  any  require  such 
an  interpretation  (Knoll  2003a).  Indeed, 
Kopp  et  al.  (2005)  have  hypothesized  that 
cyanobacteria  originated  only  in  association 
with  the  initial  accumulation  of  free  oxygen 
in  the  atmosphere,  2320-2450  Ma  (Holland 
2006).  Careful  geobiological  analyses  of 
well-preserved  Archean  rocks  remain  a 
priority  for  continuing  research. 
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In  combination,  paleontological  and 
organic  geochemical  data  suggest  that  the 
second  half  of  Earth  history  can  be  divided 
into  three  major  eras,  with  respect  to  marine 
photosynthesis.  Limited  data  from  Paleo- 
proterozoic  and  Mesoproterozoic  rocks 
suggest  that  cyanobacteria  and  other  pho¬ 
tosynthetic  bacteria  dominated  primary 
production  at  that  time,  with  anoxygenic 
photosynthetic  bacteria  playing  an  impor¬ 
tant  role,  at  least  locally,  in  water  masses 
subtended  by  a  euxinic  oxygen-minimum 
zone.  Indeed,  available  data  suggest  that 
cyanobacteria  continued  as  principal  pho¬ 
toautotrophs  well  into  the  Phanerozoic  Eon 
and  long  after  photosynthesis  originated 
in  eukaryotic  cells.  sterane  abundances 
indicate  that  green  algae  joined,  but  did  not 
entirely  displace,  cyanobacteria  as  major 
primary  producers  during  the  latest  Prot¬ 
erozoic  and  Cambrian;  the  second  phase  of 
primary  production  history  thus  initiated 
persisted  until  the  Mesozoic  radiation  of 
modern  phytoplankton  dominants.  Later 
Triassic  oceans  may  have  been  the  first  in 
which  cyanobacteria  played  a  relatively 
minor  role  in  continental  shelf  production. 
(Of  course,  they  remain  important  today 
in  the  open  gyre  systems  little  recorded 
by  pre-Jurassic  sedimentary  rocks.)  The 
degree  to  which  Chi  a+c  algae  partici¬ 
pated  in  Neoproterozoic  and  Paleozoic 
marine  ecosystems  remains  unresolved,  but 
if  present  their  role  must  have  been  much 
smaller  than  it  has  been  during  the  past  200 
million  years. 

The  observation  that  the  oceans  have 
experienced  two  major  shifts  over  the  past 
billion  years  in  the  composition  of  primary 
producers,  and  the  corollary  that  at  least 
some  clades  emerged  as  ecologically  domi¬ 
nant  primary  producers  long  after  their 
evolutionary  origin,  invites  discussion  of 
possible  drivers.  The  importance  of  cyano¬ 
bacteria  in  Proterozoic  primary  production 
can  be  attributed  to  at  least  two  circum¬ 
stances,  their  early  diversification  and  envi¬ 


ronmental  circumstances  in  Proterozoic 
oceans.  Prior  to  the  proliferation  of  eukary¬ 
otic  algae,  cyanobacteria  would,  of  course, 
have  had  an  open  playing  field,  flourishing 
in  oxygenated  surface  waters  from  coast¬ 
line  to  mid-ocean  gyres,  although  ceding 
deeper,  at  least  intermittently  euxinic  parts 
of  the  photic  zone  to  green  and  purple  pho¬ 
tosynthetic  bacteria.  Why,  however,  does 
it  appear  that  cyanobacteria  continued  as 
dominant  features  of  the  photosynthetic 
biota  on  continental  shelves  long  after  red 
and  green  algae  entered  the  oceans?  At  least 
in  part,  the  answer  may  have  to  do  with 
the  nutrient  structure  of  oceans  in  which, 
beneath  an  oxygenated  surface  layer,  the 
oxygen  minimum  zone  (Brocks  d  al.  2005), 
if  not  the  entire  deep  ocean  (Canfield  1998), 
had  a  high  propensity  for  developing 
cuxinia.  Under  these  conditions,  one  would 
expect  little  fixed  N  to  resurface  during 
upwelling  (Anbar  and  Knoll  2002;  Fennel  et  al. 
2005),  providing  strong  selective  advantage 
for  cyanobacteria  able  to  fix  N  and  scavenge 
low  concentrations  of  fixed  K  effectively 
from  seawater. 

Increasing  oxygenation  of  the  oceans 
during  the  Neoproterozoic  Era  (Canfield 
et  al.  2006;  Fike  et  al.  2006)  would  have 
begun  to  alleviate  the  N  budget,  as  the  mid¬ 
level  waters  that  source  upwelling  would 
have  been  increasingly  likely  to  remain 
oxic,  limiting  denitrification  and  anammox 
reactions  that  strip  fixed  N  from  ascend¬ 
ing  anoxic  water  masses.  More  ammonium 
would  have  been  returned  to  the  surface, 
and  nitrate  would  have  begun  to  accu¬ 
mulate  for  the  first  time.  In  consequence, 
eukaryotes  would  have  spread  more  com¬ 
pletely  across  benthic  environments  and 
into  the  phytoplankton,  as  recorded  in  the 
geological  record  (Knoll  et  al.  2006). 

Dinoflagellates,  diatoms,  and  coccolitho- 
phorids  exhibit  many  features  that  collectively 
account  for  their  ecological  success  in  mod¬ 
ern  oceans  (Delwiche,  Chapter  10;  de  Vargas 
et  al.,  Chapter  12;  Kooistra  et  al.,  Chapter  11, 
this  volume).  Why,  then,  do  we  not  see  evi¬ 
dence  for  similar  success  in  Paleozoic  seas? 
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One  possibility  is  that  the  secondary  endosym- 
bioses  that  led  to  these  groups  took  place  only 
at  the  beginning  of  the  Mesozoic  Era  or  shortly 
earlier.  Such  a  scenario  is  consistent  with  clade- 
specific  molecular  clocks  for  diatoms  and  coc- 
coHthophorids  but  is  inconsistent  with  the 
h3^othesis  that  secondary  endosymbiosis 
involving  red  algal  photosymbionts  occurred 
only  once,  in  the  early  history  of  the  chromal- 
veolates  (Hackett  et  ai,  Chapter  7,  this  volume). 
Regardless  of  the  timing  of  clade  origination, 
however,  we  need  to  consider  environmental 
factors,  for  the  simple  reason  that  it  is  hard  to 
conceive  of  biological  barriers  would  have  pre¬ 
vented  secondary  endosymbiosis  long  before 
the  Mesozoic  began. 

Black  shale  distributions  may  provide  per¬ 
spective  on  this  issue.  Multiregional  to  glo¬ 
bally  widespread  black  shales  are  essentially 
absent  from  Cenozoic  successions  but  occur 
at  about  seven  discrete  stratigraphic  horizons 
in  the  Mesozoic  record  (Jones  and  Jenkyns 
2001).  In  contrast,  there  are  at  least  seven  black 
shale  horizons  in  the  Devonian  record  alone 
and  many  more  in  other  parts  of  the  Paleo¬ 
zoic,  especially  the  Cambrian  and  Ordovician 
(Berry  and  Wilde  1978).  Prior  to  the  dawn  of 
the  Cambrian,  most  shales  were  carbonaceous 
(e.g.,  KnoU  and  Swett  1990;  Abbott  and  Sweet 
2000).  If  the  redox  structure  of  the  oceans  influ¬ 
enced  the  selective  environment  of  green  ver¬ 
sus  Chi  a+c  phytoplankton,  then  it  may  be  that 
only  in  Mesozoic  oceans  did  environmental 
conditions  routinely  favor  the  latter.  As  noted 
previously,  fossils  and  biomarkers  indicate 
that  greens  and  cyanobacteria  transiently  re¬ 
established  themselves  as  principal  primary 
producers  during  the  Mesozoic  OAEs;  green 
sulfurbacteriaalsoproliferatedduringepisodes 
of  photic  zone  euxinia.  Moreover,  unlike  chro- 
malveolate  photoautotrophs,  both  green  algae 
and  cyanobacteria  show  a  pronounced  prefer¬ 
ence  for  ammonium  over  nitrate  in  metabo¬ 
lism  (Litchman,  Chapter  16,  this  volume). 
Thus,  the  long-term  redox  evolution  of  the 
oceans  may  govern  the  composition  of  marine 
primary  producers  through  time. 

Whatever  their  drivers,  the  two  observed 
transitions  in  the  marine  photosynthetic 


biota  provide  an  important  framework  and 
stimulus  for  continuing  paleobiological 
investigations  of  animal  evolution.  Latest 
Proterozoic  and  Cambrian  phytoplankton 
radiation  may  not  simply  be  a  response 
to  animal  evolution  (e.g.,  Peterson  and 
Butterfield  2005)  but  also  a  driver.  Well- 
documented  (Bambach  1993)  increases  in 
body  size  among  Mesozoic  (versus  Paleo¬ 
zoic)  marine  invertebrates  may  reflect  the 
Mesozoic  radiation  of  larger  net  plankton, 
while  the  so-called  Mesozoic  Marine  Revo¬ 
lution  among  mostly  Cretaceous  and  Ceno¬ 
zoic  marine  animals  may  specifically  reflect 
the  rise  to  ecological  prominence  of  diatoms 
(see  Finkel,  Chapter  15,  this  volume).  Vermeij 
(1977)  first  documented  the  major  evo¬ 
lutionary  changes  in  skeletonized  marine 
fauna  during  this  interval  and  ascribed  it 
to  a  late  Mesozoic  radiation  of  predators 
able  to  penetrate  shells.  Bambach  (1993), 
however,  argued  that  the  required  radia¬ 
tion  of  top  predators  could  only  occur  as  a 
consequence  of  increased  primary  produc¬ 
tion  and,  hence,  increase  nutrient  status  in 
the  oceans.  Bambach  (1993)  suggested  that 
evolving  angiosperms  increased  nutrient 
fluxes  to  the  oceans,  and  although  this  likely 
did  occur  (see  Knoll  2003c),  the  evolution 
of  a  high-quality  food  source  and  efficient 
nutrient  transporter  in  the  form  of  diatoms 
likely  played  at  least  an  equal  role. 

A.  Directions  for  Continuing  Research 

Over  the  past  decade,  both  paleontolo¬ 
gists  and  organic  geochemists  have  made 
inroads  into  problems  of  photosynthetic 
history.  Nonetheless,  there  continue  to 
be  more  questions  than  answers.  Future 
research  will  require  more  and  independ¬ 
ent  studies  of  fossils  and  hydrocarbon  dis¬ 
tribution  on  Archean  and  Proterozoic  rocks. 
However,  it  will  also  require  phylogenetic, 
biosynthetic,  and  functional  studies  of  ster¬ 
ols  and  BHPs  (especially  2-Me-BHP)  in  liv¬ 
ing  organisms  that  will  increase  our  ability 
to  interpret  ancient  records.  In  comparable 
fashion,  continuing  research  on  Protero- 
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zoic  and  Paleozoic  microfossils  will  need 
to  stress  wall  ultrastructure  (Arouri  et  al. 
1999, 2000;  Talzyina  2000;  Javaux  et  al.  2004) 
and  microchemical  analysis  (e.g.,  FTIR, 
hydropyrolysis,  x-ray  and,  perhaps,  Raman 
spectroscopy;  Love  et  al.  1995;  Schopf  et  al. 
2002a,  b;  Boyce  et  al.  2003;  Marshall  et  al., 
2005)  interpreted  in  light  of  corresponding 
analyses  of  living  cells  and  younger,  taxo- 
nomically  unambiguous  fossils. 

Finally,  as  noted  previously,  the  evolu¬ 
tion  of  photosynthetic  organisms  did  not 
take  place  in  a  passive  or  unchanging  ocean 
nor  did  it  occur  in  an  ecological  vacuum, 
Improved  understanding  of  Earth's  redox 
history  and  the  evolutionary  record  of  ani¬ 
mals  (and  land  plants)  (Falkowski  et  al. 
2004)  will  provide  the  framework  needed  to 
interpret  the  evolutionary  history  of  marine 
photoautotrophs  as  it  continues  to  emerge. 
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The  carbon  cycle  and  associated  redox 
processes  through  time 

John  M.  Hayes^’*  and  Jacob  R.  Waldbauer^ 

^Department  of  Geology  and  Geophysics^  Woods  Hole  Oceanographic  Institution,  Woods  Hole, 

MA  02543,  USA 

^ Joint  Program  in  Chemical  Oceanography,  Woods  Hole  Oceanographic  Institution,  and  Massachusetts 
Institute  of  Technology,  Cambridge,  MA  02139,  USA 

Earth’s  biogeochemical  cycle  of  carbon  delivers  both  limestones  and  organic  materials  to  the  crust.  In 
numerous,  biologically  catalysed  redox  reactions,  hydrogen,  sulphur,  iron,  and  oxygen  serve 
prominently  as  electron  donors  and  acceptors.  The  progress  of  these  reactions  can  be  reconstructed 
from  records  of  variations  in  the  abundance  of  in  sedimentary  carbonate  minerals  and  organic 
materials.  Because  the  crust  is  always  receiving  new  CO2  from  the  mantle  and  a  portion  of  it  is  being 
reduced  by  photoautotrophs,  the  carbon  cycle  has  continuously  released  oxidizing  power.  Most  of  it 
is  represented  by  Fe^"*"  that  has  accumulated  in  the  crust  or  been  returned  to  the  mantle  via 
subduction.  Less  than  3%  of  the  estimated,  integrated  production  of  oxidizing  power  since  3.8  Gyr 
ago  is  represented  by  O2  in  the  atmosphere  and  dissolved  in  seawater.  The  balance  is  represented  by 
sulphate.  The  accumulation  of  oxidizing  power  can  be  estimated  from  budgets  summarizing  inputs  of 
mantle  carbon  and  rates  of  organic-carbon  burial,  but  levels  of  O2  are  only  weakly  and  indirectly 
coupled  to  those  phenomena  and  thus  to  carbon-isotopic  records.  Elevated  abundances  of  in 
carbonate  minerals  ca  2.3  Gyr  old,  in  particular,  are  here  interpreted  as  indicating  the  importance  of 
methanogenic  bacteria  in  sediments  rather  than  increased  burial  of  organic  carbon. 

Keywords:  carbon  cycle;  carbon  isotopes;  atmospheric  oxygen;  methanogenesis; 
subduction;  mantle 


Together,  biological  and  geological  processes — 
oxygenic  photosynthesis  and  the  burial  of  organic 
carbon — get  credit  for  producing  and  maintaining  the 
O2  in  Earth’s  breathable  atmosphere.  This  view  of  the 
carbon  cycle  as  the  engine  of  environmental  evolution 
is  based  on  sedimentary  records.  The  disappearance  of 
mass-independent  fractionation  of  the  isotopes  of 
sulphur  is  the  most  reliable  indicator  for  the  accumu¬ 
lation  and  persistence  of  traces  of  O2  in  the  atmosphere 
beginning  at  about  2.4  Gyr  ago  (Ga)  (Po2^  10~^  atm; 
Pavlov  &  Kasting  2002;  Farquhar  &  Wing  2003; 
Bekker  et  al.  2004).  Biomarkers  derived  from  lipids 
associated  with  cyanobacteria  first  appear  in  sedimen¬ 
tary  rocks  with  an  age  of2.7  Gyr  (Summons  etal.  1999; 
Brocks  et  al.  2003;  Eigenbrode  2004).  These  provide 
strong  evidence  for  oxygenic  photoautotrophy  at  that 
time  (i.e.  for  production  of  O2  as  opposed  to  its 
accumulation,  persistence,  and  global  distribution). 
Abundances  of  oxidized  and  reduced  minerals  in 
ancient  soil  profiles  and  sediments  generally  indicate  a 
transition  from  weakly  reducing  to  weakly  oxidizing 
conditions  at  Earth’s  surface  soon  after  2.47  Ga  (Bekker 
et  al.  2004;  Canfield  2005;  Catling  &  Claire  2005). 

But  what  about  examining  the  engine  itself? 
Records  of  the  burial  of  organic  carbon,  which 
should  be  provided  by  abundances  of  ^^C  in 
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sedimentary  carbonates  and  organic  material 
(Broecker  1970),  could  indicate  the  pace  and  the 
mechanism  of  oxidation.  For  events  and  processes 
during  the  past  500  Myr,  carbon-isotopic  records 
have  been  interpreted  with  considerable  success  (e.g. 
Garrels  &  Lerman  1981;  Holland  1984;  Berner 
1991,  2004;  Kump  &  Arthur  1999).  The  same 
approach  has  been  extended  to  Precambrian  records 
(e.g.  Schidlowski  et  al.  1975;  Hayes  1983,  1994; 
Knoll  et  al.  1986;  Derry  et  al.  1992;  Karhu  & 
Holland  1996;  Halverson  et  al.  2005),  but  con¬ 
clusions  have  usually  been  qualitative  rather  than 
quantitative.  The  carbon-isotopic  record  can  be 
described  as  ‘consistent  with’  some  postulated  event 
or  process,  but  understanding  of  the  carbon  cycle 
has  not  been  complete  enough  to  allow  resolution  of 
uncertainties  or  elaboration  of  details. 

Two  recent  findings  may  change  this.  First,  new 
evidence  (Saal  et  al.  2002)  has  led  to  wide  agreement 
(cf.  Resing  et  al.  2004)  on  the  rate  at  which  C  is 
delivered  to  the  crust  from  the  mantle  at  mid-ocean 
ridges.  This  significant  reduction  in  uncertainties  about 
the  input  allows  a  new  approach  to  carbon  budgets. 
Second,  a  previously  overlooked  output  of  carbon  from 
the  ocean  has  been  recognized  and  quantified.  Whereas 
earlier  concepts  limited  the  outputs  to  sedimentary 
carbonates  and  organic  matter,  the  new  view  includes  a 
very  large  flow  of  carbon  that  is  taken  up  during  the 
weathering  of  seafloor  basalts.  This  changes  the 
structure  of  the  mass  balances.  The  potential  role  of 
this  production  of  ‘carbonated  basalts’  in  the  carbon 
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exogenic  reaction  chamber 


gases 


deep  sediments 

oceanic  crust 


Figure  1.  A  schematic  of  the  biogeochemical  cycle  of  carbon.  The  arrows  represent  fluxes  of  carbon  that  are  explained  in  the  text. 


cycle  was  first  identified  by  Staudigel  et  al.  (1989). 
Subsequently,  Alt  &  Teagle  (1999,  2003)  have 
examined  amounts  and  isotopic  compositions  of  these 
carbonate  minerals  and  discussed  their  importance  in 
the  modern  carbon  cycle.  Walker  (1990),  Sleep  & 
Zahnle  (2001),  and  Nakamura  &  Kato  (2004)  have 
called  attention  to  carbonation  of  submarine  basalts  as 
an  important  phenomenon  during  the  Archaean. 
Bjerrum  &  Canfield  (2004)  have  introduced  a 
systematic  treatment  of  this  problem. 

Our  purpose  here,  based  on  those  developments,  is 
to  demonstrate  a  new  approach  to  studies  of  the  carbon 
cycle  and  its  effects  on  the  global  environment.  Among 
those  effects,  we  focus  on  oxidation  of  Earth’s  surface 
over  the  past  4  Gyr. 

1.  THE  STRUCTURE  OF  THE  CARBON  CYCLE 

A  geochemical  view  of  the  carbon  cycle  is  shown  in 
figure  1 .  The  scheme  chosen  highlights  processes  that 
control  the  isotopic  composition  of  inorganic  carbon 
dissolved  in  the  ocean.  That  carbon  pool  is  in  rough 
equilibrium  with  the  atmosphere  and  with  carbonate 
minerals  derived  from  seawater.  It  is  the  senior  author 
of  our  best  records  of  how  the  carbon  cycle  has 
operated  over  the  course  of  Earth  history.  To  interpret 
those  records,  we  must  consider  the  fiuxes  indicated  in 
figure  1.  They  represent  processes  that  are  linked  by 
balances  of  mass  and  electrons.  The  related  equations 
are,  at  present,  analytical  tools  rather  than  components 
of  an  elaborate  model. 

The  reaction  chamber  in  which  isotopic  variations 
are  shaped  is  comprised  of  the  atmosphere,  hydro¬ 
sphere,  and  C-exchanging  sediments  and  soils.  Carbon 
flows  into  that  chamber  from  the  mantle  and  by 
recycling  of  carbon  within  the  crust.  It  leaves  through 
burial  in  sediments  and  during  weathering  of  seafloor 
basalts.  If  the  amount  of  C  in  the  reactor  is  constant, 
the  inputs  will  be  balanced  by  the  outputs: 

Jam  Jar  J  or  “1“  ■^av  J  ov  ~  Jab  “1“  '^ob  Jaw  (1-1) 

The  terms  in  this  equation  represent  fluxes  of  carbon  in 
mol  per  time.  The  subscripts  ‘a’  and  ‘o’  refer  to 
inorganic  and  organic  carbon.  In  detail,  Jam  is  the  total 
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input  of  C  from  the  mantle  via  outgassing  of  magmas  at 
seafloor  hydrothermal  vents  and  at  hot-spot  and  island- 
arc  volcanoes;  /gr  and  /or  are,  respectively,  returns  of 
carbonate  and  organic  C  by  exposure  and  weathering  of 
deposits  on  continents  and  shelves;  and  J^v  and  ^ov  are 
returns,  principally  by  arc  volcanism,  of  carbonate  and 
organic  C  remobilized  during  subduction.  Among  the 
outputs.  Jab  and  /ob  are  the  net  burials  of  carbonate-and 
organic-C  in  marine  sediments,  deriving  mainly  from 
processes  in  surface  waters  but  often  bearing  strong 
secondary  imprints;  and  J^vj  is  carbonate  being  taken 
up  at  the  seafloor  during  the  weathering  of  basalts. 
These,  and  all  additional  definitions  pertinent  to  this 
discussion,  are  summarized  in  table  1 . 

A  second  mass  balance  equates  incoming  and 
outgoing  Its  elaboration  leads  to  a  key  indicator 
of  variations  in  the  operation  of  the  carbon  cycle.  For 
simplicity  and  generality,  it  is  often  written  in  this  form: 

J\h  =Jab^ab  +-^ob^ob  ~^Jaw^aw^  (1-2) 

where  J^  represents  the  summed  inputs  to  the  exogenic 
reaction  chamber,  represents  their  weighted-average 
isotopic  composition  and  the  remaining  6  terms 
represent  the  isotopic  compositions  of  the  indicated 
fluxes.  In  detail,  a  fully  correct  form  would  be 

Jam^am  +  JarK  +  J  otK  +  ^v-^av  +  ‘^ov^ov 

=  -/ab<5ab+^b<5ob+-/aw4w  (1-3) 

Because  values  for  many  of  the  terms  on  the  left  side  of 
this  equation  are  inaccessible,  analyses  usually  proceed 
from  equation  (1.2),  incorporating  the  assumptions 
that  Ji=Jah'^Joh'^Jaw  ^^at  =  When  histori¬ 
cal  variations  are  considered,  the  latter  has  two 
components:  (i)  that  mixing  of  recycling  inputs 
eventually  yields  an  unbiased  sample  and  (ii)  that  6am 
is  constant.  Over  rock-cycle  time-scales  of  300  Myr  or 
more,  the  first  requirement  is  probably  met.  Second, 
the  constancy  and  uniformity  of  6am  well  supported. 
Independent  of  age  of  emplacement  or  location, 
diamonds  from  peridotitic  xenoliths  have  6  =  —  5%o 
(Pearson  et  al.  2004).  The  same  value  is  found  in 
carbonatites  and  mantle-derived  basalts  (Kyser  1986; 
Mattey  1987). 
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Table  1.  Definitions. 


term  definition 


main  variables 

Aok  rate  of  accumulation  of  oxidants  in  crust  (mol  O2  equivalent)  per  time 

J  flux  of  carbon  (mol  per  time) 

L  flux  of  oxidant  or  reductant  (mol  O2  equivalent)  per  time 

M  molar  quantity  of  C  (no  subscript:  total  crustal  carbon,  all  forms) 

/  fraction  of  C  buried  in  organic  form  (= Job/(  •/ab  +  ‘^aw+-^ob)) 

jx  flux  of  substance  x  (mol  per  time) 

Ag  isotopic  difference  between  DIG  in  surface  seawater  and  diagenetically  stabilized  carbonate  minerals  in 

sedimentary  rocks  (%o);  see  equation  (1.4) 

Am  isotopic  difference  between  DIG  in  surface  seawater  and  carbonate  in  weathered  oceanic  basalts  (%o);  see 

equation  (1.6) 

y  fraction  of  crustal  G  recycling  during  r.  y  =  (1  —  e“*^),  where  ^  =  (In  2)/(half-mass  age) 

S  5^^G  relative  to  the  Vienna  PeeDee  Belemnite  standard  (Zhang  &  Li  1990) 

f  isotopic  fractionation  between  DIG  in  surface  seawater  and  sedimentary  organic  carbon  (%o);  see  equation 

(1.5) 

(p  Fe^'^/XFe 

A  fraction  of  buried  carbonate  accounted  for  by  ocean-crustal  carbonates  (  =  /aw/(  ■^aw+Ab)) 

r  time-step  in  numerical  integrations  (10^  years),  figures  5,  7  and  9 


subscripts  appended  to  7,  L  and  5 
first  or  only  part 
Ox  oxidant 

Red  reductant 

a  carbonate  carbon 

i  input  to  exogenic  reaction  chamber;  see  equation  (1.3) 

o  organic  carbon 

second  part,  pertaining  to  a  directional  flux 

b  burial,  transfer  from  exogenic  reaction  chamber  to  deep  sediments 

c  transfer  from  deep  sediments  to  continental  crust 

d  transfer  from  subduction  zone  to  mantle 

h  transfer  from  ocean  crust  to  continental  crust 

m  transfer  from  mantle  to  exogenic  reaction  chamber 

r  (returns)  transfer  from  continent  to  exogenic  reaction  chamber 

s  (subduction)  transfer  from  deep  sediments  to  subduction  zone 

V  (volcanism)  transfer  from  subduction  zone  to  exogenic  reaction  chamber 

w  (weathering)  transfer  from  exogenic  reaction  chamber  to  ocean  crust 

X  transfer  from  exogenic  reaction  chamber  to  space 

z  transfer  from  ocean  crust  to  subduction  zone 


The  task  now  is  to  provide  a  useful  approach 
to  interpreting  observed  variations  in  6ab5  die  carbon- 
isotopic  composition  of  sedimentary  carbonates.  As 
has  become  conventional,  we  define  the  fraction  of 
input  C  buried  in  organic  form  as/=yob/A-  Following 
Bjerrum  &  Canfield  (2004),  we  define  the  fraction 
of  total  carbonate  accounted  for  by  ocean-crustal 
carbonates  as  A=7aw/(‘/aw  +  '/ab)- 

Our  approach  to  the  isotopic  variables  differs  from 
previous  expositions.  As  the  reference  point,  we  choose 
the  isotopic  composition  of  total  dissolved  inorganic 
carbon  (DIG)  in  marine  surface  waters,  63.  The 
isotopic  compositions  of  the  outputs  are  related  to  5a 
by  the  following  expressions: 


(1.4) 

=  5a  —  e. 

(1.5) 

a 

1 

II 

(1.6) 

where  Ag  is  the  globally  averaged  isotopic  difference 
between  surface  DIG  and  diagenetically  stabilized 
sedimentary  carbonates.  At  present,  for  example, 
comparison  of  pre-industrial  5a  (Quay  et  al.  2003) 
and  average  sedimentary  carbonate  (Shackleton  1987) 
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indicates  Ag~2%o.  Local  variations  in  Ag  can  affect 
specific  sedimentary  records.  The  difference  in  isotopic 
composition  between  sedimentary  organic  carbon  and 
DIG,  E,  is  principally  (though  not  exclusively)  due  to 
isotopic  discrimination  during  biotic  carbon  fixation. 
The  sign  chosen  in  equation  (1.5),  with  ^>0  corre¬ 
sponding  to  depletion  of  ^^G  in  biomass,  is  conven¬ 
tional  in  marine  biogeochemistry.  Values  of  Am  will  be 
positive  when  ocean-crustal  carbonates  are  depleted  in 
^^G  relative  to  surface  waters.  Reports  of  5aw3  required 
to  evaluate  Am,  are  rare.  Alt  &  Teagle  (2003)  find  5aw= 
1.7  +  0.4%o  for  ocean-crustal  carbonates  that  have 
formed  during  the  past  160  Myr.  The  average  value 
of  5ab  during  the  same  interval  (Veizer  et  al.  1999)  is 
1.7%o.  At  present,  therefore,  Am=Ag=2%o.  Other 
reports  occasionally  indicate  lower  values  of  5aw  and 
thus  suggest  larger  values  of  Am  but,  based  on 
associated  sedimentary  features,  the  authors  uniformly 
attribute  the  depletion  of  ^^G  to  infrequent  additions  of 
G  derived  from  oxidation  of  organic  material. 

Substitution  of  equations  (1.4)-(1.6)  in  equation 
(1.3)  and  simplification  of  the  result  yields 

^ab  -  =  /(^  -  Ag)  +  A(  1  -/)(Am  -  Ag).  (1.7) 
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For  Ag=03  equation  (1.7)  is  equivalent  to  equation 
(1.4)  of  Bjerrum  &  Canfield  (2004).  If,  in  addition, 
either  or  both  Am  and  X  are  zero,  equation  (1.7) 
becomes  3ab  ~  the  expression  found  in  numerous 

prior  discussions  of  isotopic  fractionation  in  the  carbon 
cycle.  Does  this  expression  indicate  that  A^  and  Ag  can 
be  as  effective  as/and  ^  in  controlling  Probably 

not.  Their  leverage  is  relatively  small.  In  most 
circumstances,  the  first  term, /(e  — Ag),  will  be  at  least 
four  times  larger  than  the  second,  A(1  — /)(Am~  Ag). 

Rearrangement  of  equation  (1.7)  yields  an 
expression  for  /,  the  fraction  of  carbon  buried  in 
organic  form: 


e  — Ag  — A{Am  — Ag) 


(1.8) 


Commonly  (e.g.  Hayes  et  al.  1999), /is  estimated  from 
(^ab“^i)/(^ab~^ob)-  This  is  precisely  equivalent  to 
((5ab“5i)/(e~  Ag).  The  numerator  and  denominator  in 
that  fraction  lack  the  correction  terms,  —  A(Am~Ag), 
which  are  prominent  in  equation  ( 1 . 8) .  If  Am  =  Ag,  as  at 
present,  A(Am~  Ag)  =  0  and  the  terms  are  inconsequen¬ 
tial,  independent  of  the  importance  of  basalt  carbona- 
tion.  Such  cases  are  summarized  graphically  in 
figure  2a.  The  vertical  line  at  6ab~<5i  =  6.5%o  indicates 
6ak=\.‘5%o,  near  Earth’s  observed,  long-term  average 
value.  The  lines  correspond  (left  to  right)  to  ^  =  22,  26, 
30  and  34%o.  The  shaded  area  indicates  that, 
depending  on  the  value  of  e,  <5ab=l-5%o  corresponds 
to  0.2</<0.32.  This  range  encompasses  the  values 
most  frequently  noted  in  previous  discussions  of  the 
carbon  cycle.  In  particular,  (5ab~^ob  =  28%03  corre¬ 
sponding  to  e  =  30  and  Ag=2%o,  is  representative  of 
much  of  the  Phanerozoic  (Hayes  et  al.  1999). 

The  lower  /  values  marked  by  shading  in  the  other 
frames  of  figure  2  correspond  to  the  same  value  of  (iab? 
but  are  based  on  different  estimates  of  Ag,  Am  and  A. 
Negative  values  of  Ag  are  observed  when  sedimentary 
carbonates  are  strongly  affected  by  methanogenic 
diagenesis  (Irwin  et  al.  1977).  In  Phanerozoic  strata, 
which  have  formed  in  the  presence  of  relatively 
abundant  O2  and  SO/“,  this  phenomenon  is  restricted 
to  concretions  or  other  zones,  in  which  supplies  of 
sedimentary  organic  matter  have  been  large  enough 
that  methanogenesis  has  eventually  become  promi¬ 
nent.  When  concentrations  of  O2  and  SO/“  in 
seawater  were  significantly  lower,  methanogenesis 
must  have  been  more  important.  Accordingly,  the 
effects  of  negative  values  of  Ag,  corresponding  to 
globally  important  levels  of  methanogenic  diagenesis, 
are  explored  in  figure  26,c.  Figure  2d  shows  that 
variations  in  Am  are  probably  least  important  in 
affecting  estimates  of/.  Under  steady-state  conditions, 
inversion  of  the  oceanic  gradient — enrichment  of 

in  bottom  waters  (Am=“5%o,  figure  2d) — is 
practically  required  to  produce  5ab~^i<0. 


2.  REDOX  BALANCES  IN  THE  CARBON  CYCLE 

Figure  3  duplicates  the  plan  of  figure  1,  but  depicts 
flows  of  oxidants  and  reductants  generated  by  the 
carbon  cycle.  The  biological  cycle  of  production  and 
respiration  is  at  its  centre.  Redox  partners  for  C  are 
generalized  as  Red  and  Ox.  The  focus  on  oxidants 
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Figure  2.  Graphs  indicating  relationships  between  /  and 
5ab~^i  (1-7).  Slopes  and  intercepts  vary  in  response  to 
varying  values  off,  A,  Ams  and  Ag.  In  each  frame,  one  of  these 
has  been  assigned  four  different  values  and  the  others  have 
been  held  constant.  The  values  assigned  are  indicated  in  each 
frame.  For  the  parameter  that  varies,  the  sequence  of  values 
corresponds  to  the  lines  as  seen  from  left  to  right. 

rather  that  oxygen  is  necessary.  The  carbon-burial  flux 
provides  information  about  the  consumption  of 
electron  donors,  but  not  about  the  identity  of  those 
donors.  Over  the  course  of  Earth  history,  the  electron- 
donating  role  of  Red  has  been  played  by  H25  Fe^"*", 
and  H2O  (at  least).  Correspondingly,  the  oxidized 
forms  of  these  substances  have  accumulated  in  Earth’s 
crust.  The  rate  of  accumulation  will  be  set  by  the 
difference  between  the  rate  at  which  carbon-cycling 
produces  oxidants  and  the  rate  at  which  those  oxidants 
are  consumed. 

To  quantify  rates,  we  begin  by  viewing  the  J^,  terms, 
introduced  above  as  fluxes  of  carbon,  as  fluxes  of 
reducing  power.  The  L  terms  shown  in  figure  3 
represent  flows  of  oxidizing  or  reducing  power  carried 
by  other  elements  or  by  mixtures  of  C  and  other 
elements,  (mol  O2  equivalent)  per  time.  Because 
reduction  of  CO2  to  organic  carbon  and  oxidation  of 
H2O  to  O2  are  both  four-electron  processes,  values  of 
the  Jq  and  L  terms  are  numerically  equivalent. 
Production  of  1  mol  of  organic  carbon  could  be 
balanced  by  oxidation  of  2  mol  of  H2J  8  mol  of  Fe^"*", 
0.5  mol  of  (to  SO/“),  or  2  mol  of  H2O.  All  would 
be  equivalent  to  1  mol  of  O2.  To  emphasize  that  it 
pertains  to  the  net  effect  of  multiple  processes,  rather 
than  to  a  specific  flux,  we  use  Aqx  to  designate  the  rate 
at  which  oxidants  accumulate  in  the  crust  and  exogenic 
reaction  chamber. 

To  begin,  the  biological  carbon  cycle  releases 
oxidants  at  the  rate  at  which  organic  carbon  is  buried 
in  deep  sediments  (Joh)-  The  rate  of  accumulation  is 
then  moderated  by  effects  of  two  types.  Within  the 
crust  itself,  a  portion  of  the  oxidants  is  consumed  by  the 
geological  carbon  cycle.  Two  pathways  are  shown  in 
figure  3.  The  first  is  oxidative  weathering  of  organic 
carbon  exposed  on  and  eroded  from  the  continents 
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exogenic  reaction  chamber 


hydrogen  escape 


Figure  3.  A  schematic  of  flows  of  oxidants  and  reductants  generated  by  the  carbon  cycle.  Sites  of  oxidation  are  marked  by 
the  letter  O. 


(/or)*  The  second  is  oxidation  of  reduced  gases 
produced  at  subduction  zones.  The  corresponding 
flux  is  designated  as  LrccIv  includes  not  only  volcanic 
gases  (H25  CO,  802),  but  also  products  ranging  from 
methane  to  petroleum,  which  are  delivered  to  the 
exogenic  reaction  chamber  by  thermal  processes  in 
subduction  zones  and  deep  basins. 

The  export  of  oxidants  and  reductants  also  moder¬ 
ates  This  occurs  in  subduction  zones  and  at  the 
top  of  the  atmosphere.  Within  subduction  zones, 
electrons  are  transferred  between  subducted  organic 
materials  (/os)  oxidants  (metal  oxides  and  804^“, 
Lqxs)  ths  descending  slab.  Products  will  include  not 
only  those  returned  to  the  crust  (noted  above),  but  also 
unconsumed  oxidants  (Loxd)  reduced  carbon 

(diamond,  graphite,  /od)  exported  to  the  mantle. 
8ubduction  of  carbonate  decreases  the  mass  of  carbon 
in  the  crust,  but,  because  carbon  is  supplied  from  the 
mantle  as  CO2J  does  not  export  either  oxidizing  or 
reducing  power  from  the  crust.  8ubduction  of  sulphide 
is  similarly  inconsequential.  Mantle  8  occurs  as 
sulphide  even  under  oxidizing  conditions  (e.g. 
AFMQ-2;  Luth  2004). 

At  the  top  of  the  atmosphere,  H2  can  be  lost  to 
space,  thus  exporting  reducing  power  and  effectively 
increasing  Aqx-  This  occurs  when  reduced  gases  of 
either  volcanic  or  biological  origin  reach  high  altitudes. 
Depending  on  atmospheric  conditions,  a  portion  of  the 
reducing  power  carried  by  these  gases  can  be  lost  as 
hydrogen  escapes  to  space  (Catling  et  al.  2001). 

8ummation  provides  an  estimate  of  the  rate  at  which 
oxidants  will  accumulate: 

~  release  of  oxidants  within  crust  by  C  cycle) 
+  (reductant  export)  —  (net  oxidant  export) 

■^Ox  —  (*^ob  d or  TRedv)  T  (Toxd  d qj). 

(2.1) 

The  second  expression  denotes  Aq^  as  a  minimum 
value  because  the  net  release  of  oxidants  by  the  C  cycle 
may  exceed  /ob~*^or“TRedv  This  would  occur  if  a 
portion  of  the  organic  carbon  returning  from  con¬ 
tinents  (/or)  was  not  reoxidized  but  instead  simply 
reburied.  The  efficiency  of  reoxidation  will  depend  on 


the  nature  of  Ox  (e.g.  O2  versus  804^“)  and  the 
reactivity  of  /or  (e-g-  graphitic  kerogen  versus 
hydrocarbons). 

Equation  (2.1)  ignores  a  category  of  processes  often 
considered  in  oxygen  budgets.  8ediments  and  oceanic 
crust  assimilated  by  continents  will  incorporate  por¬ 
tions  of  the  Red  and  Ox  from  the  exogenic  reaction 
chamber.  Oxidized  and  reduced  forms  of  sulphur  and 
iron  are  prime  examples.  8urging  flows  of  these 
reactants  to  the  sediments  or  from  continents  can,  for 
example,  signiflcantly  affect  levels  of  O2  in  the 
atmosphere  and  ocean.  These  and  related  phenomena, 
especially  in  the  sulphur  cycle,  have  already  been 
elegantly  treated  by  others  (Berner  2004;  Canfleld 
2004). 

Imbalances  yielding  a  net  release  or  consumption  of 
oxidizing  power  by  the  carbon  cycle  are  of  two  types. 
The  more  obvious  are  marked  by  isotopic  signals 
indicating  rapid  variations  of  /  and  consequent 
departures  from  steady  state.  At  such  times,  the  system 
will  evolve  dynamically  (Rothman  2003).  At  other 
times,  when/varies  slowly,  the  system  will  evolve  quasi- 
statically  through  a  succession  of  steady  states.  The 
persistence  of  small  imbalances  between  fluxes  can  lead 
to  the  accumulation  of  crustal  inventories  of  carbon, 
chiefly  on  the  continents.  In  this  case,  the  fluxes 
designated  in  equation  (2.1)  should  be  represented  as 
time-dependent  variables.  Then,  at  any  time,  t, 

Ao^{t)dt=  [  [/„b(r)--/or(J)--t-Redv(z)]dt 
0  Jo 

■f-Redx(j)dt  -  f  [Loxd(t)--/od(J)]dr. 

0  Jo 

(2.2) 

The  first  term  on  the  right-hand  side  is  the  integrated 
difference  between  burial  and  reoxidation  of  organic 
carbon.  It  quantifles  the  accumulation  of  organic 
carbon  in  continents  and  marine  sediments.  The 
second  and  third  terms  quantify  the  effects  of  loss  of 
H2  to  space  and  of  subduction  of  oxidized  and  reduced 
materials.  Notably,  this  summation  of  oxidation  has  no 
isotopic  dimensions,  /ob  is  related  to /and  thus  to  the 
isotopic  record,  but  it  is  /ob~‘^or“TRedv  that  matters. 
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and  it  is  further  altered  by  effects  of  subduction  and 
escape  of  H2  to  space. 

The  first  term  in  equation  (2.2)  is  closely  related  to  a 
principle  dating  from  the  nineteenth  century  (J.  J. 
Ebelmen’s  work  from  1845  to  1855,  reviewed  by 
Berner  &  Maasch  1996)  and  elaborated  in  modern 
detail  most  influentially  by  Garrels  (e.g.  Garrels  & 
Perry  1974)  and  Berner  (2004  and  earlier  references 
cited  therein).  Specifically,  the  amount  of  organic 
carbon  stored  in  the  crust  should  balance  the  oxidizing 
power  represented  by  the  crustal  inventories  of  Fe^"*", 
S04^“  and  O2.  To  this,  space  science  and  plate  tectonics 
have  added  the  second  and  third  terms. 

It  is  difiicult  to  reconstruct  the  histories  of  the 
variables  in  equation  (2.2).  A  boundary  value  for  the 
first  integral,  however,  can  be  obtained  by  consider¬ 
ation  of  crust-mantle  carbon  budgets  and  variations  in 
/,  the  organic-carbon  burial  fraction. 

3.  INPUTS  OF  MANTLE  CARBON 

Fluxes  and  inventories  throughout  the  carbon  cycle 
depend  on  inputs  of  carbon  from  the  mantle.  These 
occur  at  mid-ocean  ridges,  arc  volcanoes,  hotspots  and 
in  plume  events.  The  present  strengths  of  these  sources 
will  be  considered  sequentially.  An  estimate  of 
variations  over  the  course  of  Earth  history  will  follow. 

(a)  Mid-ocean  ridges 

Each  year,  21  km^  of  basalt  is  added  to  the  oceanic 
crust  at  spreading  centres  (Crisp  1984;  also  consistent 
with  a  plate-creation  rate  of  3.4km^yr~^  (Rowley 
2002)  and  a  plate  thickness  of  5-7  km  (White  et  al. 
1992;  Kadko  1994)).  Sampled  after  cooling  at  the 
seafloor,  its  CO2  content  is  dependent  on  the 
hydrostatic  pressure.  At  the  depths  of  mid-ocean 
ridges,  the  result  is  commonly  about  200  p.p.m.  (the 
routinely  reported  concentrations  refer  to  weights  of 
CO2).  The  question  is  how  much  CO2  was  in  the 
parent  magma.  The  difference  will  have  been  trans¬ 
ferred,  by  way  of  hydrothermal  circulation,  to  the 
ocean. 

Recently,  Saal  et  al.  (2002)  have  shown  that,  in 
undegassed  MORB  (mid-ocean  ridge  basalt),  concen¬ 
trations  of  CO2  vary  with  those  of  Nb.  The  weight  ratio 
is  CO2  :  Nb  =  239 +  46  (2(7).  Since  Nb  is  not  lost 
during  degassing,  the  initial  CO2  content  of  a  sample  of 
MORB  can  be  estimated  from  its  content  of  Nb.  The 
average  for  normal  MORBs  (i.e.  those  in  which  trace- 
element  abundances  have  not  been  affected  by 
proximity  to  plumes  or  recently  subducted  continental 
materials)  on  the  East  Pacific  Rise  is  3.45  p.p.m.  Nb 
(Su  &  Langmuir  2003).  The  estimated,  average,  initial 
content  of  CO2  is  thus  3.45X239  =  825  p.p.m.  If 
200  p.p.m.  remain  in  the  cooled  basalt,  the  difference 
transmitted  to  the  exogenic  reservoir  is  625  p.p.m. 
Given  a  rock  density  of  2.8  g  cm~^,  this  corresponds  to 
0.8TmolCyr~^  (Tmol  =  teramol=  10^^  mol).  If  the 
Nb  average  quoted  for  all  MORBs  (5.02  p.p.m.;  Su  & 
Langmuir  2003)  is  instead  used  as  the  basis  for  the 
calculation,  the  result  is  1 .3  Tmol  C  yr~  ^ .  These  values 
bracket  a  third  estimate,  namely  0.9  Tmol  C  yr~^, 
reported  in  the  original  publication  (Saal  et  al.  2002) 
and  reached  using  a  slightly  different  approach. 
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Numerous  alternative  estimates  have  been  based  on 
CO2  :  ^He  ratios  in  hydrothermal  fluids  and  gases. 
These  have  been  reviewed  by  Resing  et  al.  (2004),  who 
settle  on  a  range  of  0.5-2  Tmol  C  yr~^.  Preferring  the 
approach  based  on  chemical  analyses  of  the  rocks,  we 
adopt  1  Tmol  C  yr~^  as  the  present  magnitude  of  the 
mid-ocean  ridge  component  of  Jam- 

(b)  Arc  volcanoes 

The  annual  magma  volume  is  0.5  km^  (Carmichael 
2002).  Carbon  dioxide  is  abundant  in  the  gases,  but  its 
isotopic  composition  often  deviates  from  the  mantle 
value  and  the  CO2  is  regarded  as  deriving  from 
subducted  sedimentary  carbonates  and  organic  carbon 
as  well  from  the  mantle  (Sano  &  Marty  1995;  Sano  & 
Williams  1996;  Shaw  et  al.  2003,2004).  The  total  flux 
of  CO2  from  arc  volcanism  is  approximately 
1.6  Tmol  yr~^  (Hilton  et  al.  2002).  Of  this,  approxi¬ 
mately  13%  is  from  the  mantle  (Shaw  et  al.  2003), 
yielding  an  arc-volcanic  component  of  J^m 
0.2  Tmol  C  yr~^.  The  remaining  1.4Tmolyr~^  is 
recycling  crustal  C.  Wallace  (2005)  obtains  a  similar 
result  by  a  different  method. 

(c)  Oceanic  islands  and  plumes 

In  a  collection  of  estimates,  this  is  the  most  uncertain. 
The  annual  volume  of  magma  is  approximately  3  km^ 
(though  possibly  as  small  as  1.9  km^),  combining 
igneous  provinces  and  hotspot  volcanoes  on  continents 
with  those  in  the  ocean  (Crisp  1984).  A  more  recent, 
separate  tabulation  of  large  igneous  provinces  by 
Marty  &  Tolstikhin  (1998)  finds  a  total  of  95.5  X 
10^  km^  in  the  past  250  Myr,  for  a  rate  of  0.4  km^/yr.  It 
is  broadly  agreed  that  these  magmas  are  volatile-rich 
compared  to  MORB.  Basing  their  estimate  on  ^He 
budgets,  Marty  &  Tolstikhin  (1998)  suggest  that  the 
total  output  from  oceanic  islands  and  plumes  ‘is  at  best 
similar  to  that  of  spreading  centres’.  Given  our  estimate 
above,  this  suggests  an  input  of  somewhat  less  than 
1  TmolCyr“^ 

Since  mid-ocean  ridge  magmas  are  roughly  10  times 
more  voluminous,  the  estimated  equal  flux  of  mantle 
CO2  from  oceanic  islands  and  plumes  calls  for  a  10-fold 
enrichment  of  CO25  and  thus  perhaps  of  Nb,  in  the 
parent  magmas.  Observed  enrichments  ofNb  (Hofmann 
2004)  range  from  at  least  13-fold  (Mangaia,  Pitcairn, 
Tahaa)  to  2-fold  (Mauna  Loa).  Moreover,  Pineau  et  al. 
(2004)  have  suggested  that  CO2  :  Nb  ratios  might 
range  to  values  more  than  3-fold  higher  than  that 
found  by  Saal  et  al.  (2002).  In  sum,  the  estimate  of 
1  Tmol  C  yr~^,  equal  to  that  at  the  spreading  centres, 
is  plausible  but  highly  uncertain. 

Together,  mid-ocean  ridges,  arc  volcanoes  and 
emissions  at  island  volcanoes  and  during  plume  events 
provide  an  annual  input  from  the  mantle  of  approxi¬ 
mately  2.2  Tmol  C. 

4.  INVENTORIES  AND  ACCUMULATION 
OF  CARBON 

Total  quantities  of  carbonate  carbon  in  the  crust, 
estimated  from  stratigraphic  inventories  (Holser  et  al. 
1988;  Wedepohl  1995;  Hunt  1996;  Des  Marais  2001; 
Berner  2004;  Arvidson  et  al.  in  press),  commonly  range 
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from  2800  to  6500  Emol  (Emol  =  examol=  10^^  mol). 
The  same  reports  provide  estimates  of  the  total 
quantity  of  organic  carbon  ranging  from  675  to 
1300  Emol.  Only  one  of  these  (Arvidson  et  al.  in 
press)  includes  carbonate  associated  with  basalt.  It  also 
yields  the  lowest  ratio  of  organic  to  total  carbon, 
namely  0.13.  The  other  reports  yield  organic  fractions 
ranging  from  0.15  to  0.20,  and  seem  to  be  influenced 
by  isotopic  mass  balances,  which  suggest  higher  relative 
quantities  of  organic  carbon. 

Wilkinson  &  Walker  (1989)  took  an  alternative 
approach  and  focused  exclusively  on  sedimentary 
carbonates.  Examining  mass-age  data,  they  found 
that  the  best  flt  could  be  provided  by  a  constant- 
mass,  constant-burial  system  with  first-order  recycling 
and  including  9600  Emol  carbonate  carbon.  An 
alternative  fit  emphasizing  data  from  younger 
sequences  yielded  a  result  of  7900  Emol.  If  we 
arbitrarily  adopt  an  organic-carbon  fraction  of  0.15, 
the  corresponding  inventories  of  organic  carbon  are 
1700  and  1400  Emol,  for  total  carbon  inventories  of 
11  300  and  9300  Emol. 

The  most  detailed  inventory  (Holser  et  al.  1988) 
provides  total  crustal  C  =  7640  Emol.  Favouring  the 
approach  using  mass-age  data,  particularly  that  based 
on  younger  sequences,  we  weight  it  equally  with  the 
stratigraphic  compilations  and  estimate  total  crustal 
C  =  8500  Emol. 

The  time  required  to  accumulate  this  inventory 
depends  on  assumptions  about  the  input  flux.  If  the 
total  flux  of  2.2TmolCyr~^  estimated  above  were 
constant,  the  time  required  would  be  3.86  Gyr.  More 
commonly,  it  is  believed  that,  earlier  in  Earth  history, 
such  fluxes  were  higher.  Here,  we  follow  Sleep  & 
Zahnle  (2001)  and  Lowell  &  Keller  (2003).  These 
authors  scale  input  fluxes  to  estimates  of  high- 
temperature  heat  flow,  chiefly  at  spreading  centres. 
For  the  case  in  which  continents  grew  episodically  from 
10  to  80%  of  current  area  between  3200  and  2500  Myr 
ago  (Ma),  high-temperature  heat  flow  is  calculated  to 
decrease  from  9X  the  current  level  at  3800  Ma,  to 
4.9X  at  3200  Ma,  to  2.3X  at  2500  Ma,  and  then 
to  decline  exponentially.  The  resulting  scaled  carbon 
flux  is  shown  in  figure  4. 

Integration  of  that  flux  beginning  at  3800  Ma,  the 
end  of  the  late,  heavy  bombardment,  yields  the  totals 
depicted  graphically  in  figure  5.  The  sum  exceeds 
8500  Emol,  the  estimated  present  crustal  inventory, 
after  only  575  Myr,  at  3225  Ma.  In  fact,  with 
continents  just  beginning  to  form  and  crustal  storage 
reservoirs  thus  sharply  restricted,  returns  of  carbon  to 
the  mantle  would  probably  have  become  nearly  equal 
to  inputs  from  the  mantle  well  before  then. 

Together,  figure  5  and  equations  (l.l)-(2.2)  provide 
a  new  context  for  considering  the  development  of  the 
carbon  cycle.  Before  turning  to  the  isotopic  records, 
however,  we  must  first  review  available  information 
regarding  the  fates  of  carbon  in  subduction  zones. 


5.  CARBON  CYCLING  AT  SUBDUCTION  ZONES 

Processes  in  subduction  zones  are  crucial  to  redox 
balances  in  the  carbon  cycle.  Figure  6  shows  flows  of 
carbon  (in  oxidized  and  reduced  form)  and  other 
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Figure  4.  Estimated  values  of  7am  ^s  a  function  of  time.  The 
value  at  0  Ma  is  documented  in  the  text.  The  scaling 
relationship  at  earlier  times  derives  from  Sleep  &  Zahnle 
(2001)  and  Lowell  &  Keller  (2003). 


Figure  5.  Total  amount  of  mantle  carbon  delivered  to  the  crust 
as  a  function  of  time.  Specifically,  total  mantle-to-crust 
C  =  X]3800  7am(0^3  where  t  is  the  age,  Myr,  and  r,  the  time- 
step,  is  10^  years.  The  broken  line  at  8500  Emol  C  represents 
the  best  estimate  of  the  present  crustal  inventory  of  C  (table  2) . 

products  of  carbon  cycling  into,  through  and  out  of 
subduction  zones.  Reducing  power  produced  by  the 
carbon  cycle  is  carried  into  subduction  zones  by 
organic  carbon  (7os)-  Oxidizing  power  is  carried  by 
sulphate  and  by  ferric  iron.  The  crossing  paths  and 
redox  reactions  suggest  the  diverse  transformations 
involved.  While  many  different  sequences  of  reactions 
are  possible,  the  equations  in  figure  6  summarize  the 
required  balances  of  mass  and  electrons.  All  of  the 
carbon  that  enters  the  subduction  zone  must  be 
transferred  in  some  form  to  either  the  crust  or  mantle. 
And  the  balance  between  inputs  of  reductants  and 
oxidants  must  be  reflected  by  materials  leaving  the 
subduction  zone.  Resolution  and  dissection  of 
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arc  gases 
■/vco^ 


Figure  6.  Schematic  effluxes  of  carbon  and  reducing  power  in  a  subduction  zone.  Terms  denote  Fe^''’/IJFe  (<j9),  and  fluxes  of 
carbon  (7,  mol  per  time),  specific  substances  (7,  mol  per  time)  and  oxidizing  or  reducing  power  (L,  mol  O2  equivalent  per 
time).  Subscripts  are  defined  in  table  1.  Chemical  reductions  and  oxidations  are  marked  by  circled  letters  (R  or  O).  Adjacent 
numbers  indicate  number  of  electrons  gained  or  lost. 

processes  within  subduction  zones  are  not  currently 
possible,  but  available  evidence  bears  on  two  key 
questions.  When  a  slab  is  subducted,  (i)  what  happens 
to  the  carbon  and  (ii)  what  happens  to  the  reducing 
power  carried  by  the  organic  matter? 

(a)  Carbon  fluxes 

The  downgoing  flux,  (/ad+‘^od)3  might  be  quantified 
directly  if  we  knew  how  much  material  was  being 
incorporated  by  the  mantle  and  if  samples  of  it  were 
returned  to  the  surface,  so  that  their  carbon  contents 
could  be  determined.  Although  7od  and  7ad  are  sampled 
by  diamond-bearing  rocks  and  mantle  xenoliths,  it  is 
not  clear  how  representative  these  samples  are,  nor  do 
they  allow  for  quantification  of  the  total  downgoing  C 
flux. 

More  headway  can  be  made  by  considering  the 
fluxes  from  arc  volcanism  and  other  volatile  emissions 
from  subduction  zones  and  then  estimating  the  down¬ 
going  flux  from  the  difference  between  the  trench  input 
and  surface  output.  The  magnitudes  of  both  are  rather 
uncertain;  here,  we  summarize  the  best  estimates.  The 
three  inputs  are  the  organic  and  carbonate  components 
of  subducted  sediment  ( 7os  and  and  the  carbonated 
basalt  in  the  subducting  crust  (7az)-  The  global 
subducting-sediment  budget  of  Plank  &  Langmuir 
(1998)  puts  7as  at  0.9  Tmol  C  yr~\  Estimates  of7os  are 
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very  poorly  constrained,  mostly  because  it  is  small  in 
comparison  to  fluxes  to  and  from  the  continents  and 
shelves.  For  the  modern  C  cycle,  Holser  et  al.  (1988) 
estimate  We  adopt  their  value,  7os  = 

0.2  Tmol  C  yr~^,  suggesting  a  mean  Corg  content  for 
subducting  sediment  of  0.13  wt%. 

The  third  flux,  7azj  likely  delivers  most  of  the  carbon 
to  subduction  zones.  It  derives  from  7aw3  for  which 
estimates  range  over  more  than  an  order  of  magnitude 
from  less  than  1  to  more  than  3  Tmol  C  yr~^  (Bach 
et  al.  2003).  The  alternate  fate  for  carbon  buried  by 
basalt  carbonation  is  represented  by  7ahj  the  portion  of 
carbonated  basalt  actually  accreted  onto  continents. 
Since  this  is  relatively  small,  it  follows  that  carbonate 
can  be  stored  in  ocean  crust  along  passive  margins  for 
hundreds  of  millions  of  years,  as  in  much  of  the  Atlantic 
basin  today.  The  idea  that  7az  approaches  7aw 
incorporates  an  assumption  that,  following  tectonic 
rearrangements,  such  material  is  eventually  subducted. 
The  resulting  rough  estimate  of  the  total  subduction 
input  is  2-4  Tmol  C  yr~^. 

For  the  volatile  outputs  from  arcs,  the  most 
complete  budget  has  been  compiled  by  Hilton  et  al. 
(2002),  who  considered  subduction  inputs  to  and 
volatile  emissions  from  26  arc  systems  worldwide.  The 
authors  calculated  a  global  volcanic  arc  CO2  flux  of 
1.6  Tmol  Cyr~^  and  emphasized  the  importance  of 
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distinguishing  between  sources  of  volatiles.  For  CO25 
these  are  carbonate  and  organic  C  in  the  subducting 
slab  and  CO2  in  the  mantle  wedge.  Since  the  mantle 
component  amounts  to  0.2TmolCyr~^  (noted 
above),  jvCOj^  1-4  Tmol  C  yr~\  The  portion  due  to 
subducted  carbonates  can  be  estimated  from  the 
isotopic  composition  of  the  non-mantle  component. 
For  all  of  the  arcs,  it  exceeded  the  input  of  sedimentary 
carbonate.  In  all  cases  but  one,  inclusion  of  carbonated 
basalt  eliminated  the  shortfall.  Notably,  these  results 
are  at  odds  with  calculated  phase  equilibria  that 
indicate  carbonated  basalts  should  undergo  little 
devolatilization  along  most  subduction  geotherms 
(Kerrick  &  Connolly  2001).  The  other  component  of 
is  ysCOjj  emission  of  CO2  from  seeps,  especially  in 
fore-arc  and  back-arc  regions.  This  flux  is  uncon¬ 
strained,  and  may  be  as  large  as  jvCOj  (Hilton  et  al. 
2002).  Ingebritsen  &  Manning  (2002)  have  pointed  to 
difiuse  fluid  flow  through  tectonically  active  crust  as 
potentially  a  major  flux  of  subduction-derived  volatiles. 
This  degassing  pathway  may  be  sufiicient  to  reconcile 
the  crust-mantle  water  balance,  and  could  well 
constitute  a  significant  return  of  slab-derived  CO2  to 
the  crust.  Accordingly,  is  between  1.4  and 
2.8  Tmol  C  yr~\ 

The  other  return  flux  of  carbon  from  subduction 
zones  to  the  exogenic  chamber  is  the  reduced 
carbon  from  high-and  low-temperature  seeps,  primar¬ 
ily  CH4.  The  related  geological  forms  are  diverse,  and 
include  mud  volcanoes  and  seeps  associated  with  gas 
hydrates  (Milkov  &  Etiope  2005;  Milkov  2005).  Their 
output  of  CH4,  estimated  at  2.1  Tmol  yr~^  (Milkov  & 
Etiope  2005),  derives  from  microbial  methanogenesis 
and  thermal  processes  in  sediments  and  sedimentary 
rocks,  as  well  as  from  subducted  carbon.  Since  the 
carbon  in  /qv  (  =  CH4)  has  oxidation  number  equal  to 
—  4  and  the  organic  carbon  in  /qs  has  oxidation  number 
equal  to  zero,  redox  balance  provides  the  stronger 
constraint.  If  ^os  is  approximately  0.2  Tmol  C  yr~^,  /qv 
cannot  be  larger  than  0.1  Tmol  C  yr~\  Total  volcanic 
and  seep  fluxes  of  carbon  (/av+'^ov)  ^re  between  1.5 
and  2.9  Tmol  C  yr~\ 

From  inputs  of  2-4  Tmol  Cyr~^  and  recycling 
fluxes  of  1.5-2.9  Tmol  C  yr~^,  we  estimate  that  the 
fraction  recycled  is  approximately  0.6.  By  difference,  the 
flux  returning  to  the  mantle  is  0.8-1. 6  Tmol  C  yr~^. 

(b)  Reducing  power 

Any  reduced  C  sent  to  the  mantle — graphite,  diamond, 
elemental  C — leaves  oxidant  behind  in  the  surface 
environment,  thus  contributing  to  Aqx-  It  is  also 
possible  that  organic  C  would  be  oxidized  within  the 
subduction  zone  by  reaction  with  sulphate  or  an 
oxidized  metal,  such  as  Fe^"*".  When  this  occurs,  does 
the  reduction  product  (Fe^"*",  for  example)  become  an 
organic-carbon  proxy  that  also  contributes  to  Aqx^  No, 
the  reaction  instead  amounts  to  a  last-minute  reversal 
of  processes  within  the  carbon  cycle.  Within  the 
exogenic  reaction  chamber,  Fe^"*"  will  have  been 
produced  within  the  downgoing  slab  by  hydrothermal 
alteration  and  by  oxidation  of  Fe^"*"  at  the  expense  of 
O2.  Or  the  sulphate  will  have  been  produced  by 
processes  within  the  carbon  cycle.  In  either  case,  the 
oxidizing  power  will  be  represented  by  an  equivalent 
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quantity  of  organic  carbon.  For  the  carbon  cycle, 
therefore,  the  oxidation  of  organic  carbon  by  oxidized 
metals  or  sulphur  within  subduction  zones  is  a 
functional  equivalent  of  biological  respiration.  Details 
follow. 

If  the  Fe^"*"  or  sulphate  is  a  product  of  aerobic 
oxidation,  it  carries  the  oxidizing  power  of  O2  produced 
during  photosynthesis.  The  same  organism  that 
produced  the  O2  produced  an  equivalent  amount  of 
organic  carbon.  The  oxidation  of  that  organic  material 
within  the  subduction  zone  amounts  to  a  reversal  of  the 
overall  process. 

If  the  Fe^"*"  was  produced  at  the  expense  of  sulphate 
during  hydrothermal  alteration,  that  sulphate  can 
similarly  be  traced  to  photosynthetic  O2  and  carbon. 
The  chain  of  chemical  events  has  an  additional  link,  but 
the  oxidation  of  organic  carbon  within  the  subduction 
zone  is  again  simply  a  reversal  of  the  process. 

If  the  sulphate  was  produced  by  anaerobic,  photo¬ 
synthetic  bacteria,  those  organisms  will  have  produced 
an  equivalent  amount  of  organic  carbon.  The  oxidation 
of  that  organic  material  within  the  subduction  zone 
amounts  to  a  reversal  of  the  overall  process. 

Finally,  if  the  Fe^"*"  was  produced  by  serpentiniza- 
tion,  an  equivalent  quantity  of  H2  will  also  have  been 
produced  and  used  by  microbiota  to  produce  an 
equivalent  amount  of  organic  carbon,  either  directly, 
through  chemosynthesis,  or  indirectly,  through  con¬ 
sumption  of  O2,  photosynthetic  production  of  organic 
matter,  etc.  Again,  the  oxidation  of  organic  C  within 
the  subduction  zone  amounts  simply  to  a  reversal. 

If  the  subducted  organic  carbon  returned  to  the 
surface  as  CO2  after  reducing  an  inorganic  substance  to 
some  oxidation  state  lower  than  that  of  its  input — if 
Fe^"'"/I!Fe  in  the  downgoing  slab  were  driven  to  values 
lower  than  Fe^’'‘/i;Fe  in  unaltered  MORE — f/zar  would 
contribute  to  Aq^.  Failing  that,  Jod>^  provides  the 
only  route  by  which  processes  in  subduction  zone  can 
yield  a  net  export  of  oxidizing  power  by  the  carbon 
cycle. 

In  magnitude,  Jod  is  constrained  to  be  less  than  /os* 
Exchange  of  C  between  reduced  and  oxidized  pools  is 
not  excluded.  Some  of  the  C  in  yod  might  derive  from 
yas  or  Jaz-  Estimation  of  requires  knowledge  of  Jqs 
and  LRedv  The  rate  of  subduction  of  organic  matter,  as 
discussed  above,  is  here  taken  as  0.2  Tmol  yr~\ 
though  better  estimates  are  clearly  warranted.  LRedv 
has  two  components:  the  reduced  gas  flux  from  arc 
volcanoes,  and  the  reduced  gases  from  other  seeps, 
represented  hy  jcu  • 

The  reduced  component  of  the  volcanic  gas  flux 
is  represented  by  yHa+sOj  /HaS-  Ii^  present 
context,  the  question  is  how  it  relates  to  Josi  the 
reducing  power  delivered  by  subduction  of  organic 
carbon.  The  hydrogen  abundance  in  volcanic  gases  is 
set  by  redox  equilibrium  with  water,  such  that 
H2 :  H2O  ratios  are  maintained  near  ca  0.01  at 
most  eruptive  temperatures  (Giggenbach  1996). 
Taking  the  arc  magmatic-water  flux  at  17Tmolyr~^ 
(Wallace  2005)  results  in  a  hydrogen  component  of 
0.17  Tmol  yr~^.  The  SO2  efflux  from  arcs,  some  of 
which  may  derive  from  sources  other  than  reduction 
of  subducted  sulphate,  is  of  similar  magnitude, 
0.16-0.28  Tmol  yr~^  (Halmer  et  al.  2002;  Wallace 
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2005).  Other  reduced  species  in  arc  gases  are 
relatively  minor.  The  ratio  of  H2S  :  SO2  is  generally 
between  1  and  0.05  (Halmer  et  al.  2002),  and  some 
of  the  hydrogen  sulphide  released  at  volcanic  arcs 
likely  derives  from  volatilization  of  sulphide  in  the 
downgoing  slab  rather  than  reduction  of  sulphate  by 
organic  carbon.  The  reducing  power  carried  by  other 
volatiles,  such  as  CO,  COS  and  CS2,  is  orders  of 
lower  magnitude.  Since  H2  and  SO2  represent  two- 
electron  reductions,  the  total  flux  is  halved  to 
convert  to  moles  O2  equivalent,  whereas  H2S 
produced  from  sulphate  represents  eight  electrons 
or  2  mol  O2  equiv.  The  resulting  estimate  from  gas 
chemistry  is  0.2  Tmol  O2  equiv.  yr~^  from  H2  and 
SO2  and  0.4-0.02  Tmol  O2  equiv.  yr~^  from  H2S. 
By  itself,  the  flux  of  H2  and  SO2  is  already 
equivalent  to  the  reducing  power  carried  by  sub¬ 
ducted  organic  carbon.  The  obvious  presence  of 
additional  reduced  outputs,  namely  volcanic  H2S 
and  hydrocarbons  at  seeps  {jcn^y  shows  that  better 
knowledge  of  redox  budgets  for  subduction  zones  is 
needed.  With  due  regard  for  the  uncertainties 
imposed  by  the  present  budgets,  it  also  suggests  that 
most  or  all  reducing  power  carried  by  subducted 
organic  carbon  is  returned  to  the  exogenic  reaction 
chamber  and  that  Jod  is  small  at  present. 

6.  CYCLING  OF  CARBON  AND  ITS  REDOX 
PARTNERS  OVER  TIME 

Over  time,  the  crust  has  accumulated  carbon.  Inte¬ 
grated  inputs  from  the  mantle  have  exceeded  integrated 
returns  to  the  mantle.  As  a  means  of  exploring  the 
balance,  we  can  accept  the  fluxes  and  reservoirs 
proposed  thus  far  as  hypotheses  and  consider  how 
8500  Emol  C  might  have  accumulated  and  what 
electron  donors  were  probably  associated  with  the 
production  of  organic  carbon. 

(a)  The  crust-mantle  carbon  balance 
Values  of  (■/ad  +  ‘^od)/’^am  control  the  accumulation  of 
crustal  C.  When  they  are  less  than  1,  the  crustal 
inventory  will  grow.  Ideally,  a  geologic  record  would 
exist,  but  proxies  for  {Jad'^Jod)  are  rare.  The  low 
values  of  some  diamonds,  particularly  those  of 
eclogite  paragenesis,  strongly  suggest  derivation  from 
crustal  organic  carbon  (Pearson  et  al.  2004),  though 
this  has  been  disputed  (Deines  et  al.  2001). 
Recently,  it  is  been  suggested  that  organic  carbon 
can  be  subducted  beyond  250  km  and  contribute  to 
sublithospheric  diamonds  (Tappert  et  al.  2005).  The 
stability  of  carbonated  eclogite  at  high  pressures  and 
temperatures  (Dasgupta  et  al.  2004)  indicates  that 
eclogitization  may  be  an  important  route  for  the 
subduction  of  carbon  (both  oxidized  and  reduced) 
into  the  mantle.  This  is  particularly  interesting 
in  light  of  recent  suggestions  that  eclogitization  is 
a  geologically  recent  phenomenon  (Bjornerud  & 
Austrheim  2004).  A  hotter  upper  mantle  earlier  in 
Earth  history  would  more  efficiently  devolatilize 
downgoing  slabs  at  shallower  depths,  removing 
both  carbon  and  water.  Shallow  decarbonation, 
combined  with  drying  of  the  slab  and  consequent 
inhibition  of  the  formation  of  eclogite,  may  have 
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Figure  7.  Growth  of  the  crustal  inventory  of  carbon  over  time. 
{a)  Outputs  to  the  mantle  relative  to  inputs  from  the  mantle 
as  a  function  of  time.  The  ratio  is  assumed  to  be  near  unity 
when  continents  were  non-existent  and  to  have  declined  as 
they  increased  in  size.  Values  have  been  adjusted  to  provide 
the  observed  total  of  8500  Emol  C  by  0  Ma.  {b)  The  integral 
obtained  by  applying  the  return  fraction  specified  in  panel  (a) 
to  the  fluxes  shown  in  figure  4.  In  detail,  total  crustal  C= 
^3800  4m(t)[l  -((Ad(f)  +  ^d(f))/^am(f))]r. 

meant  that  Jod  and  Jad  were  small  early  in  Earth 
history  (Des  Marais  1985).  Much  depends  on  the 
tectonic  regime,  and  when  the  present  style  of  plate 
tectonics  began,  which  has  been  the  subject  of  much 
debate  (Van  Kranendonk  2004;  Stern  2005). 

One  view  takes  the  slow  growth  of  continents 
together  with  the  steep  early  declines  in  the  rate  at 
which  carbon  is  delivered  from  the  mantle  (figure  4) 
as  constraining  variations  in  {Jad'^Jo<^IJam  quite 
strongly.  The  scenario  associated  with  figure  4 
provides  80%  of  the  present  continental  area  by 

2.5  Ga.  By  comparison,  Veizer  &  Mackenzie  (2004) 
point  to  evidence  suggesting  that  only  25%  of 
continental  crust  accumulated  between  4.0  and 

2.6  Ga,  another  35%  in  the  interval  to  1.7  Ga,  and 
the  final  40%  thereafter.  Because  the  continents  house 
the  major  reservoirs  of  crustal  carbon,  (/ad+<^od)/‘^am 
must  approach  1  (no  net  crustal  storage)  in  the  early 
Archaean  and  decline  to  lower  values  only  as  continents 
grow.  The  time  course  of  (/ad  +  ‘^od)/-/am  shown  in 
figure  la  fits  these  criteria  while  eventually  yielding  a 
crustal  inventory  of  8500  Emol  C  (figure  lb)  and  a 
modern  (/ad  +  >/od)/Am  =  fi-55.  The  latter  value  is  in  the 
middle  of  the  range  estimated  above  (0.36-0.73). 


(b)  The  accumulation  of  organic  carbon 
and  oxidized  electron  donors 

Reconstruction  of  redox  budgets  requires  estimates 
of  /.  These  derive  from  interpretation  of  the  carbon 
isotopic  records,  with  due  attention  to  the  possible 
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Figure  8.  Isotopic  compositions  of  sedimentary  carbonates 
and  of  sedimentary  organic  carbon  as  a  function  of  time.  Each 
point  represents  the  average  isotopic  composition  of  an  entire 
stratigraphic  unit.  Carbonates  (Shields  &  Veizer  2002)  are 
represented  by  circles.  Filled  symbols  represent  geologic  units 
which  are  well  dated.  Open  symbols  represent  units  for  which 
the  dates  are  approximate.  Samples  of  organic  material 
(Strauss  &  Moore  1992)  are  represented  by  triangles.  The 
ages  assigned  in  the  original  compilation  have  been  revised  to 
agree  with  those  assigned  by  Shields  &  Veizer  (2002). 

importance  of  Am,  Ag  and  A.  Figure  8  summarizes 
observations  of  and  5ob-  The  latter  represent 
chemically  isolated  samples  of  kerogen  (Strauss  & 
Moore  1992).  The  ratios  of  H  :  C  in  Precambrian 
kerogens  are  frequently  well  below  0.5  (Hayes  et  al. 
1983).  Accordingly,  the  kerogens  have  been  exten¬ 
sively  dehydrogenated  by  processes  that  frequently 
involve  the  loss  of  ^^C-depleted  hydrocarbons  and 
thus  enrichment  of  in  the  residual  kerogen. 

Because  our  objective  is  to  reconstruct  probable 
isotopic  compositions  of  organic  carbon  at  the  time 
it  was  removed  from  the  exogenic  reaction  chamber 
(i.e.  as  it  flowed  through  the  arrow  representing  /ob 
in  figure  1),  we  will  base  our  estimates  of /mainly 
on  the  lower  values  of  5ob  in  each  time  interval. 


(c)  4400-3800  Ma 

In  parallel  with  the  isotopic  records  summarized  in 
figure  8,  the  integrations  in  figures  5  and  7  have  begun 
at  3800  Ma.  This  point  in  time  roughly  marks  the  end 
of  the  late,  heavy  bombardment  and  the  beginning  of 
Earth’s  sedimentary  record.  Catling  et  al.  (2001)  have 
considered  how  earlier  events  might  have  set  the  stage 
for  carbon  cycling.  In  particular,  they  estimate  that 
impacts  of  asteroids  between  4400  and  3800  Ma 
probably  delivered  1000  Emol  of  reduced  carbon  to 
Earth’s  surface.  To  whatever  extent  such  material  was 
not  incorporated  by  the  mantle.  Earth’s  crust  will  have 
begun  with  an  inventory  of  primordial  organic  carbon. 
When  biological  cycling  of  carbon  began,  the  processes 
would  have  been 
production  : 

(^^2)from  outgassing  Rcd  ^  {Coj-glfrom  production  Ox, 

recycling  .  (Cofglpnmordial  *  (^^2)crustal  Rcd, 

net : 

(^^2)from  outgassing  (^org)primordial 

(Corg)from  production  (^^2)crustal- 


In  these  equations.  Red  and  Ox  are  reduced  and 
oxidized  forms  of  a  redox  partner,  such  as  Fe  or  S. 
Reactions  between  surviving,  asteroidal  organic 
material  (termed  ‘primordial’)  and  oxidants  produced 
during  carbon  cycling  might  have  been  biologically 
catalysed  and  occurred  within  the  exogenic  reaction 
chamber  or  they  might  have  been  thermal  and  occurred 
only  at  depth  within  accumulating  sediments.  In  either 
case,  the  resulting  CO2  became  part  of  the  crustal 
inventory.  Such  reactions  would  have  two  effects:  (i)  an 
exchange  of  biotic  organic  carbon  for  primordial 
organic  carbon  and  (ii)  consumption  of  oxidants. 
Only  when  production  acted  to  increase  the  inventory 
of  Corg  (here  taken  to  begin  at  3.8  Ga)  would  oxidants 
begin  to  accumulate. 


(d)  3800-2800  Ma 

For  this  interval,  the  isotopic  record  of  sedimentary 
carbonates  compiled  by  Shields  &  Veizer  (2002) 
includes  24  stratigraphic  units.  With  an  average  value 
of  — 0.1%o,  they  are  consistently  enriched  in 
relative  to  mantle  carbon.  The  standard  deviation  of 
the  population  is  1.9%o.  For  the  same  interval,  figure  8 
yields  an  estimate  of  (5ob=  —  36%o.  Nakamura  &  Kato 
(2004)  measured  values  of  63^^  in  basalts  from  the 
Warrawoona  Group  (3425  Myr,  Western  Australia). 
Their  average  result,  —  0.3  +  1. 2%o,  does  not  differ 
significantly  from  <5ab  in  sedimentary  carbonates  in  this 
time  interval.  Accordingly,  A^^Ag,  the  A(Am~Ag) 
terms  in  equation  (1.8)  are  small,  and /=(6ab~^i)/ 
(f  — Ag)  =  (^ab  — 6i)/(6ab~^ob)  — 0-14.  Observed  values 
of  6ab  and  60b  then  indicate  that  14%  of  the  carbon 
being  delivered  to  the  exogenic  reaction  chamber  by 
outgassing  of  CO2  from  the  mantle  was  being  buried  as 
reduced,  organic  carbon. 

The  net  production  of  organic  carbon  during  any 
increment  of  time  can  be  estimated: 

netCorg  =  (Coi-gProducedbyreductionofyam) 

+  (Coj.g  producedby  reduction  of  recycling  C) 

—  ( Coi-g  oxidized  during  recycling) .  (6.1) 

The  recycling  carbon  (i.e.  Joi^Jar^Jov'^Ja^  will  be 
some  portion  of  the  total  crustal  carbon.  If  a 
representative  half-mass  age  is  taken  as  300  Myr,  the 
fraction  recycling  in  a  10  Myr  interval  will  be  2.3%  (see 
table  1).  In  any  10  Myr  interval,  therefore,  2.3%  of 
crustal  carbon  will  be  recycled,  essentially  mixing  with 
the  10  Myr  input  of  mantle  carbon.  Of  that  total,  a 
portion  controlled  by  ( ^ad  +  ‘Aod)/‘fam  will  be  returned  to 
the  mantle.  Of  the  remainder,  a  fraction/will  be  buried 
in  organic  form.  Assuming  that  all  recycling  organic 
carbon  is  oxidized,  an  integrable  form  of  equation  (6.1) 
is  then 


netCo,g=/ 


1-- 


d aA  3 c,, 


+  yM 


—  yAfg, 


(6.2) 


where  7  is  the  fraction  of  crustal  carbon  recycling  in  a 
time  interval  of  length  r  (e.g.  0.023  for  r=10^  years 
and  a  half-mass  age  of  300  Myr),  and,  at  the  beginning 
of  the  time-step,  M  is  the  total  crustal  inventory  of 
carbon  in  all  forms  (figure  Ih')  and  is  the  total 
inventory  of  organic  carbon.  Stepwise  summation 
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Figure  9.  Integrated  net  production  of  organic  carbon, 
equivalent  to  the  net  release  of  oxidizing  power  (Emol  O2), 
over  time,  (a)  Estimated  values  of/.  For  ages  greater  than 
890  Myr,  they  are  based  on  the  isotopic  compositions  shown 
in  figure  8.  For  younger  ages,  they  are  consistent  with  the 
isotopic  compositions  shown  in  figure  8  and  are  based  in 
detail  on  records  summarized  by  Hayes  et  al.  (1999).  The 
broken  lines  between  2080  and  2400  Myr  represent  isotopic 
excursions  reviewed  by  Melezhik  et  al.  (1999)  and  discussed 
in  the  text.  (6)  The  integral  obtained  when  the /values  shown 
in  figure  9a  and  the  carbon-return  ratios  shown  in  figure  la 
are  substituted  in  equation  (6.2),  with  7  =  0.023  and 
at  3800  Ma. 

provides  an  estimate  of  the  integrated  production  of 
organic  carbon,  the  first  integral  on  the  right-hand  side 
of  equation  (2.2). 

The  calculation  is  based  on  (i)  values  of  (yad  + 
Jod)^dam  chosen  to  provide  the  observed  crustal 
inventory  of  C  (figure  la)',  (ii)  values  of  provided 
by  figure  4,  and  (iii)  values  of/ shown  in  figure  9a  and 
estimated  from  the  observations  summarized  in 
figure  8.  Setting  M=iMo  =  0  at  3800  Ma  and  summing 
Net  Corg  yields  the  result  shown  in  figure  9b.  Because 
a  sharp  drop  in  60b  suggests  a  major  change  in  carbon 
cycling  at  approximately  2770  Ma  (Hayes  1983,  1994), 
we  will  focus  first  on  the  interval  3800-2800  Ma.  By 
2800  Ma,  Mo  =  oxidant  release  =  300  Emol  O2  equiv. 

To  place  this  result  in  context,  consider  the 
inventories  summarized  in  table  2.  For  all  of  the 
materials  considered,  a  range  of  estimates  can  be 
found  in  the  literature.  Those  included  here  are 
representative  high  and  low  values.  The  value  of 
300  Emol  O2  equiv.  greatly  exceeds  the  oxidizing 
power  represented  by  O2  in  the  modern  atmosphere 
and  ocean  and  approaches  the  low  estimate  of  the 
amount  of  oxidizing  power  represented  by  total 
crustal  deposits  of  sulphate  and  sulphate  dissolved 
in  seawater.  These  comparisons  decisively  eliminate 
two  otherwise-interesting  possibilities,  namely  that  O 
or  S  could  have  served  as  the  dominant  redox 
partner  for  C  at  this  stage  of  Earth  history.  In  the 
first  case,  levels  of  O2  would  have  risen  high  enough 
to  inhibit  the  mass-independent  fractionation  of 
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Table  2.  Crustal  inventories  of  reduced  and  oxidized  products 
of  carbon  cycling®. 


estimated  crustal 
total  Emol 
O2  equivalent 


product 

high 

low 

reference 

02 

37.2 

37.2 

Keeling  er  a/.  (1993) 

1860 

1020 

Goldschmidt  (1954) 
Ronov  &  Yaroshevsky 
(1976) 

SO/- 

480 

332 

Garrels  &  Perry  (1974) 
Holser  et  al.  (1988) 

total 

oxidants 

2377 

1389 

organic  C 

1280 

675 

Des  Marais  (2001) 
Arvidson  et  al.  (in  press) 

®  A  similar  compilation  has  been  prepared  by  Cading  et  al.  (2001).  It 
includes  an  estimate  of  1 000  Emol  organic  C  delivered  to  the  crust  by 
impacts  of  asteroids  between  4400  and  3800  Ma.  The  entries  above 
pertain  to  the  present.  To  whatever  extent  primordial  organic 
material  has  survived  in  the  crustal  inventory,  it  is  included  in  the 
‘organic  C’  category. 

^  Calculated  from  data  summarized  by  Lecuyer  &  Rickard  (1999). 
Represents  Fe^'*’  in  excess  over  mantle  Fe^'*’/i?Fe  =  0.12  (Bezos  & 
Humler  2005). 

sulphur  isotopes.  In  the  second,  concentrations  of 
dissolved  sulphate  would  have  climbed  to  levels  that 
lead  to  large  differences  between  the  contents  of 
sedimentary  sulphides  and  sulphates,  a  signal  that  is 
not  observed  until  later  (Hayes  et  al.  1992). 

The  remaining  oxidized  product  listed  in  table  2  is 
ferric  iron.  Its  crustal  inventory  is  large  enough  to  offer 
plenty  of  headroom.  Iron  formations  are,  moreover, 
important  components  of  the  Archaean  sedimentary 
record.  The  present  context,  however,  requires  that 
iron  served  as  the  electron  donor  in  primary  production 
while  the  environment  remained  strictly  anaerobic. 
An  organism  capable  of  utilizing  Fe^"*"  directly  has 
recently  been  isolated  (Jiao  et  al.  2005).  In  principle, 
we  need  look  no  further,  but  alternatives  deserve 
consideration.  These  are:  (i)  the  true  value  of/  was 
really  near  zero;  (ii)  /  was  0.14  but  no  oxidants 
accumulated;  or  (iii)  /  was  0.14  and  the  electron 
donor  was  H2,  for  which  the  immediate  oxidized 
product  (H2O)  was  invisible,  although  iron  was 
probably  the  ultimate  source  of  the  electrons. 

The  true  value  of  /  could  be  approximately  zero  if 
the  basalt-carbonation  correction  had  been  wrongly 
excluded  and  the  correct  value  of  A(Ani  — Ag)  was 
4.9%o.  If  A  were  0.9,  this  would  require  that  the  initial 
average  value  of  (5aw  had  been  —  5.5%o  and  that 
Nakamura  &  Kato  (2004)  found  instead  —  0.3%o, 
because  the  isotopic  compositions  of  all  of  their 
samples  had  been  affected  by  post-depositional  car¬ 
bon-isotopic  exchange.  However,  among  42  samples  of 
four  different  types,  the  most  negative  value  of  6aw  is 
—  3.4%o  and  there  is  no  correlation  between  3  and  the 
abundance  of  carbonate.  Evidence  for  the  required 
alteration  is  therefore  lacking,  making  it  likely  that  / 
truly  approached  0.14. 

If/  approached  0.14  but  no  oxidants  accumulated 
((ii),  above),  (7ad  +  >/od)/Am  must  have  been  1, 
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indicating  rapid  and  quantitative  return  of  all  carbon  to 
the  mantle,  even  as  3.5  million  km^  of  continental  crust 
(Lowe  1992)  formed  prior  to  3.2  Ga.  Moreover,  as 
substantial  quantities  of  organic  carbon  were  forming 
as  required  by/=0.14  (and  thus  yob/‘/am^0-14),  the 
putatively  non-accumulating  oxidant,  which  could  not 
be  either  O2  or  sulphate  and  which  had  to  be  present 
only  at  trace  levels,  was  able  to  reoxidize  all  of  the 
organic  material  completely  even  as  the  carbon  itself 
returned  quantitatively  to  the  mantle.  The  required 
combination  of  circumstances  is  practically  impossible. 

The  third  alternative,  in  which /was  approximately 
0.14  and  H2  served  as  the  electron  donor,  fits  well  with 
multiple  lines  of  evidence.  Anaerobic  organisms 
capable  of  producing  organic  material  from  CO2  and 
H2  are  abundant  and  include  both  chemoautotrophs 
(methanogens,  acetogens)  and  photoautotrophs 
(photosynthetic  bacteria;  Tice  &  Lowe  2004  have 
already  postulated  that  such  organisms  were  important 
members  of  near-shore  microbial  communities 
3.4  Ga).  The  potential  versatility  of  microbial  ecosys¬ 
tems  based  on  such  organisms  is  great  enough  to 
provide  both  producer  and  consumer  assemblages  and 
thus  to  stabilize  carbon  cycling  and  yield  globally 
consistent  isotopic  fractionations.  Fermentative  con¬ 
sumption  would  remobilize  organic  C  as  a  mixture  of 
CO2  and  CH4,  providing  needed  greenhouse  warming 
(Kasting  &  Catling  2003)  and  setting  the  stage  for  the 
isotopic  transient  observed  at  2. 8-2. 7  Ga  (Hayes 
1994). 

The  required  net  input  of  H2  would  be  600  Emol 
(  =  300  Emol  O2  equiv.).  Owing  to  the  postulated 
declines  in  (figure  4)  and  increases  in  [1  — (7ad  + 
-/od)/‘^am]  (equation  (6.2)),  the  required  rate  would  rise 
only  slowly  from  0.5Tmolyr~^  at  3500  Ma  to 
0.9Tmolyr~^  at  2800  Ma.  If,  as  is  likely,  conditions 
at  the  seafloor  favoured  serpentinization  reactions 
(Sleep  et  al.  2004),  a  flux  of  0.7  Tmol  H2  yr~^  could 
be  provided  by  alteration  of  less  than  10^^  g  Mg-rich 
(komatiitic)  basalt  per  year.  The  present  rate  of 
generation  of  ocean  crust  is  about  6X10^^gyr~^,  so 
the  requirement  poses  no  problem.  In  the  overall 
sequence  of  electron  transfers,  it  is  only  the  immediate 
oxidized  product  of  carbon  fixation  (H2O)  which  is 
‘invisible’.  The  reducing  power  represented  by  the 
organic  matter  is  balanced  by  Fe^"*",  which  is  accumu¬ 
lating  or  being  subducted. 

To  summarize,  from  both  geological  and  biological 
points  of  view,  there  are  highly  plausible  mechanisms 
by  which  substantial  amounts  of  organic  carbon  could 
be  produced  and  buried  in  accord  with  the  isotopic 
records,  with  complementary  releases  of  oxidizing 
power,  without  any  requirement  for  generation  of  O2 
during  the  interval  3800-2800  Ga.  The  most  likely 
redox  partner  for  C  during  this  interval  is  Fe,  either 
directly  via  phototrophic  oxidation  of  dissolved  Fe^”*", 
and/or  indirectly,  with  H2  shuttling  electrons  from 
Fe^"*"  in  seafloor  basalts  to  phototrophic  and  chemoau- 
totrophic  producers. 

(e)  2800-1800  Ma 

The  next  billion  years  begin  with  the  apparent  onset  of 
oxygenic  photosynthesis  (Hayes  1983,  1994;  Sum¬ 
mons  et  al.  1999),  which  includes  a  period  during 
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which  the  abundance  of  in  sedimentary  carbonates 
was  sometimes  markedly  enriched  (Karhu  &  Holland 
1996;  Melezhik  et  al.  1999),  and  ends  with  the  likely 
onset  of  sulphidic  conditions  in  the  deep  sea  (Canfield 
1998). 

Estimates  of  /  derived  from  the  isotopic  records 
provide  scant  evidence  that  the  development  of 
oxygenic  photosynthesis,  for  all  its  magnificence  as  a 
biochemical  innovation,  provided  a  significant  increase 
in  the  net  release  of  oxidizing  power.  If  anything, 
initially  declined,  a  feature  marked  by  the  notch  visible 
at  2.8  Ga  in  figure  9<2.  By  2500  Ma,  /  appears  to  have 
risen  to  0.16,  a  value  only  slightly  higher  than  that 
estimated  for  the  early  Archaean.  At  2450  Ma,  when 
traces  of  O2  quench  the  mass-independent  fraction¬ 
ation  of  sulphur  isotopes,  the  estimated  minimal  total 
release  of  oxidizing  power  by  the  C  cycle  is 
460  Emol  O2  equiv.  Although  uncertainties  in  that 
total  are  very  large,  it  is  far  short  of  the  capacities  of 
crustal  Fe  and  S  to  supply  electrons  (table  2).  Levels  of 
O2  in  the  environment  would  have  been  sharply  limited 
by  the  strength  of  those  sinks. 

What,  then,  happened  between  2400  and  2000  Ma? 
Contents  of  ^^C  in  carbonates  are  frequently  elevated, 
occasionally  exceeding  +10%o  (figure  8).  Accepting 
these  compositions  as  representative  of  the  global  pool 
of  oceanic  DIC  would  imply  a  near-trebling  of  /. 
Writing  three  years  after  Karhu  &  Holland  (1996) 
published  ‘Carbon  isotopes  and  the  rise  of  atmospheric 
oxygen’,  Melezhik  et  al.  (1999)  drew  on  additional  data 
to  define  three  apparently  separate  pulses  of  isotopic 
enrichment.  That  dissection  of  the  record  yields  the 
three  spikes  in  / that  are  represented  by  broken  lines  in 
figure  9a  (alternative  stratigraphic  correlations  yield 
only  two  spikes;  Bekker  et  al.  2003) .  The  isotopic  signal 
has  been  associated  with  other  lines  of  geochemical 
evidence  indicating  that  the  atmosphere  became 
more  oxidizing  at  about  the  same  time.  The  ensemble 
has  become  known  as  ‘The  Great  Oxidation  Event’ 
(Holland  2002). 

One  view  of  the  scale  of  the  event  is  provided  by  the 
broken  line  in  figure  9b.  If  the  isotopic  signals  represent 
pulses  of  carbon  burial,  the  integrated  production  of 
organic  carbon  by  2000  Ma  would  be  600  Emol 
greater.  As  shown  by  table  2,  that  is  more  than  enough 
to  provide  the  present  inventories  of  SO/“  and  O2.  But 
there  are  problems  with  interpreting  these  isotopic 
variations  in  terms  of  carbon  burial. 

(i)  There  is  no  parallel  isotopic  enrichment  in  the 
organic  carbon,  as  would  be  expected  if  the 
carbonate  represented  the  oceanic  DIC  that  was 
the  carbon  source  for  primary  producers. 
Karhu  &  Holland  (1996)  searched  for  a  signal 
in  the  organic-carbon  record  and  found  its 
absence  ‘rather  odd’.  If  the  record  were  not  so 
fragmentary  and  noisy  (preservation  is  a  much 
greater  problem  for  organic  carbon  than  for 
carbonate  minerals)  that  the  problem  could  be 
set  aside  (Melezhik  et  al.  1999),  this  alone  would 
be  fatal  to  the  interpretation. 

(ii)  Evidence  is  lacking  for  the  large  deposits  of 
organic  carbon  that  should  have  formed 
(Melezhik  et  al.  1999;  Aharon  2005). 
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(iii)  The  sequence  of  isotopic  signals  is  reversed  from 
that  expected.  If  the  oxygenation  that  cut  off 
mass-independent  fractionation  of  sulphur  iso¬ 
topes  was  due  to  an  organic-carbon-burial 
event,  the  disappearance  of  the  mass-indepen- 
dent  fractionation  of  sulphur  isotopes  should 
not  precede  the  first  carbon-isotopic  enrich¬ 
ments. 

(iv)  It  is  difficult  to  envision  supplies  of  nutrients 
great  enough  to  sustain  the  levels  of  productivity 
required  to  supply  the  organic  carbon.  Aharon 
(2005)  has  discussed  supplies  of  phosphate  very 
insightfully,  concluding  that  efficient  stripping 
of  P  from  organic  matter  prior  to  burial  provides 
the  only  solution.  To  achieve  the  projected 
values  of  /,  the  C  :  P  ratio  in  buried  organic 
matter  would  have  to  be  4000  and  the  organic- 
preservation  rate  would  have  to  be  6.6%,  about 
fivefold  higher  than  that  observed  in  the  Black 
Sea. 

(v)  Stratigraphic  correlations  are  not  secure  enough 
to  demonstrate  that  the  various  isotopic  excur¬ 
sions  are  coeval,  and  thus  necessarily  linked  to 
variations  in  a  global  reservoir  (Aharon  2005). 

(vi)  Most  of  the  units  displaying  the  isotopic 
enrichments  are  dolostones  with  substantial 
isotopic  variability.  In  their  list  of  12  ‘major 
localities,’  Melezhik  et  al.  (1999)  report  within- 
unit  ranges  of  3-13%o  and  an  average  range  of 
7%o.  Consistent  with  this,  for  the  interval 
2350-2000  Ma  (inclusive).  Shields  &  Veizer 
(2002)  report  589  <5-values  between  —2.5  and 
+  2.5%o  and  424  between  +  5  and  +  13%o.  That 
is,  ‘normal’  isotopic  abundances  are  more 
common  than  elevated  abundances. 

(vii)  When  complete  chemical  analyses  of  the 
carbonates  are  reported,  they  often  include 
substantial  concentrations  of  Si02  (to  40%; 
Melezhik  et  al.  1999).  Stratigraphic  columns 
indicate  that  some  of  the  isotopically  enriched 
carbonates  are  closely  interbedded  with  shales 
(Buick  et  al.  1998),  others  are  described  as 
‘nodular’  (Bekker  et  al.  2003). 

The  latter  features  (vi)  and  (vii)  are  more  charac¬ 
teristic  of  diagenetic  carbonates  than  of  marine  lime¬ 
stones  faithfully  carrying  records  of  oceanic  DIG. 
When  it  is  recalled  that  values  of  ^^<5  to  +13%o  are 
common  in  much  younger  dolomites  that  have  been 
affected  by  methanogenic  diagenesis  (Klein  et  al.  1999; 
Mazzullo  2000),  that  alternative  interpretation 
demands  attention.  In  fact,  attribution  of  the  isotopic 
signals  to  methanogenic  diagenesis  has  already  been 
favoured  by  several  sets  of  authors  (Yudovich  et  al. 
1991;  Dixera/.  1995). 

In  all  likelihood,  the  diagenetic  alternative  has  failed 
to  win  popularity  because  an  alternative  does  not 
appear  to  be  needed.  Multiple,  convergent  lines  of 
evidence — independent  of  carbon-isotopic  signals — 
indicate  that  thresholds  of  environmental  oxidation 
were  crossed  during  this  time  interval  (Bekker  et  al. 
2004).  When  the  carbon-isotopic  record  fits  into  this 
picture  at  least  roughly,  why  not  include  it?  Even  more 
to  the  point,  given  the  apparent  reality  of  the  oxidation, 
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how  else  should  the  carbon-isotopic  enrichments  be 
explained?  They  are  temporally  associated,  dramatic 
and  numerous  (Bekker  et  al.  2003),  globally  distrib¬ 
uted,  and  have  the  right  polarity  (enrichment  rather 
than  depletion). 

(f )  Mechanisms  of  oxygenation 

The  problem  lies  not  in  accepting  the  isotopic  signals  as 
markers  of  the  oxidation  but  in  assuming  that  they 
represent  the  cause.  This  is  a  key  distinction.  Either  the 
steady  functioning  of  the  carbon  cycle  catalysed 
biological  and  ecological  developments  that  shifted 
the  relative  strengths  of  oxygen  sources  and  sinks 
(Holland  1978)  or  dramatic  changes  in  carbon  fluxes 
were  required.  If  the  former,  the  carbon-isotopic 
enrichments  are  environmental  reporters  comparable 
to  the  mass-independent  fractionations  of  sulphur 
isotopes.  If  the  latter,  they  point  to  increased  burial  of 
organic  carbon  but  not  necessarily  to  permanent 
changes  in  oxidation. 

Flux-driven  changes  are  subject  to  reversal.  Buried 
organic  material  will  be  recycled  by  erosion  and 
volcanism.  As  indicated  by  the  downward  trend  of 
the  broken  line  in  figure  9b  after  2000  Ma,  oxidants  will 
be  consumed.  Even  if  a  rise  in  O2  were  driven  by 
increases  in  /obj  some  additional  change  would  be 
required  to  stabilize  the  transition  and  make  it 
permanent.  Geophysical  phenomena  such  as  rifting 
and  increased  sedimentation  can  increase  but  are 
inherently  cyclical.  What  they  accomplish  in  one  epoch 
will  be  undone  in  another.  Increased  burial  of  organic 
carbon  is  a  half  answer.  It  can  push  O2  levels  higher,  but 
maintaining  them  will  require  some  further  change,  a 
second  half  to  the  answer. 

Unidirectional  change,  a  permanent  strengthening 
of  the  source  of  oxidants,  is  in  the  realm  of  evolutionary 
biology.  Physiological  and  ecological  changes  could 
stabilize  higher  levels  of  O2.  Do  they  serve  as  the  second 
half  of  the  answer?  If  so,  they  are  the  biological  results 
of  geophysical  stimuli  and  we  need  an  explanation  for 
the  linkage.  Or  are  physiological  and  ecological  changes 
answers  in  themselves?  If  so,  advances  in  oxygenation 
have  largely  biological  origins  and  combinations  of 
disparate  phenomena  are  not  required. 

Kinetic  factors  are  also  pertinent.  In  discussing  the 
carbon  cycle,  it  is  common  to  distinguish  between  the 
fast,  biological  cycle  of  photosynthesis  and  respiration 
and  the  slow,  geological  cycle  in  which  erosion, 
weathering  and  volcanism  are  balanced  by  carbon 
burial.  In  global  chemical  terms,  O2  is  a  highly  reactive, 
transient  intermediate  that  is  produced  and  consumed 
within  the  exogenic  reaction  chamber.  Its  steady-state 
abundance  must  depend  on  the  relative  strengths  of 
sources  and  sinks  in  that  system.  In  contrast,  sedimen¬ 
tary  carbonates  and  organic  materials  are  outputs  from 
that  system.  Their  relative  abundances  are  controlled 
by  geophysical  factors  that  vary  over  relatively  long 
time-scales. 

(g)  Causes  of  the  carbon-isotopic  transient 

We  can  suggest  a  sequence  of  biologically  driven 
environmental  changes,  associated  with  a  rise  in  levels 
of  O2,  that  would  produce  the  observed  isotopic  signals 
without  requiring  enhanced  burial  of  organic  carbon. 
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Figure  10.  Summary  of  a  sequence  of  conditions  that  could  account  for  the  abundance  of  ^^C-enriched  sedimentary  carbonates 
2400-2080  Myr  ago. 


They  would  also  account  for  the  absence  of  congruent 
variations  in  the  organic-carbon  isotopic  record. 

Especially  prior  to  2.3  Ga,  methanogenic  diagenesis 
must  have  been  common.  If  electrons  for  biosynthesis 
were  supplied  mainly  by  the  insolubility  of  the 

oxidized  product,  meant  that  electron  acceptors 

were  rare  in  most  surface  environments  (Walker  1987). 
When  respiring  heterotrophs  are  excluded,  the 
recycling  of  organic  carbon  in  microbial  communities 
will  be  managed  by  fermenters,  with  methanogens 
playing  a  vital  role.  This  will  have  occurred  in  the  same 
zones  that  are  now  occupied  by  aerobes.  These 
methanogenic  communities  will  have  exchanged  CO2 
freely  and  directly  with  the  oceanic  water  column,  and, 
as  a  result,  the  isotopically  enriched  pools  of  CO2  that 
are  characteristic  of  methanogenic  diagenesis  in 
modern  environments  will  not  have  developed.  This 
situation  is  summarized  in  the  ‘before’  segment  of 
figure  10. 

The  development  of  oxygenic  photosynthesis  by  at 
least  2700  Ma  began  to  change  that  situation,  however 
slowly.  Particularly  near  oxygen-producing  commu¬ 
nities  (many  of  the  isotopically  enriched  dolostones  are 
associated  with  stromatolites;  Melezhik  et  al.  1999), 
methanogens  will  eventually  have  been  pushed  to 
deeper  levels  in  the  sediment.  When  this  occurred, 
apparently  beginning  at  about  2400  Ma,  isotopically 
enriched  pools  of  porewater  DIG  will  have  developed. 
As  carbon  was  exchanged  with  the  carbonate  minerals 
in  the  sediment  (a  process  catalysed  by  the  organic 
acids  produced  during  fermentation),  the  isotopic 
signal  characteristic  of  methanogenesis  will  have  been 
recorded  for  the  first  time.  Within  about  400  Myr,  a 
second  event  shut  the  signal  off.  Fundamentally,  it 
must  have  been  some  weakening  of  the  supply  of 
organic  material  to  methanogenic  communities.  As 
suggested  in  the  ‘after’  segment  of  figure  10,  it  could 
have  been  caused  by  a  further  rise  in  the  availability  of 
electron  acceptors.  Alternatively,  the  tight  association 
between  stromatolitic  producers  and  methanogenic 
consumers,  indicated  by  many  of  the  ^^C-enriched 
sequences,  might  have  been  destabilized  by  environ¬ 
mental  changes. 

As  in  the  Neoproterozoic,  the  isotopic  excursions  are 
associated  with  apparently  extreme  glaciations  (Young 
et  al.  2001;  Tajika  2003).  The  effects  of  those 


glaciations  on  global  redox  geochemistry,  if  any,  are 
unclear.  It  has  conversely  been  proposed  that  the 
glaciations  were  caused  by  global  oxidation  which 
destroyed  a  methane  greenhouse  and  that  this  occurred 
promptly  after  the  development  of  oxygenic  photosyn¬ 
thesis  (Kopp  et  al.  2005).  The  second  part  of  this  does 
not  fit  with  biomarker  records  that  place  the  advent 
of  oxygenic  photosynthesis  at  or  before  2.7  Ga 
(Summons  et  al.  1999;  Brocks  et  al.  2003;  Eigenbrode 
2004).  It  also  does  not  fit  with  the  inescapably  slow 
effects  of  carbon  cycling.  Releases  of  oxidizing  power 
have  been  continuous  (figure  96).  But  the  oxidized 
products  accumulated  by  2.5  Ga  cannot  have 
accounted  for  more  than  a  fraction  of  the  inventory  of 
ferric  iron  and  almost  none  of  the  sulphate  (cf.  table  2). 
In  those  circumstances,  by  what  mechanism  did  the 
development  of  oxygenic  photosynthesis  promptly 
sustain  steady-state  concentrations  of  O2  high  enough 
to  destroy  the  methane  greenhouse?  Until  that  question 
can  be  answered,  the  evidence  for  (i)  a  400  Ma  delay 
between  the  evolutionary  event  and  its  environmental 
impact  and  (ii)  the  continuing  capacity  of  Fe  and  S  as 
redox  buffers  should  overrule  speculation. 

Oxidation  of  S  by  O2  would  have  produced  sulphate 
in  surface  environments.  However,  by  1800  Ma,  the 
estimated  integrated  release  of  oxidizing  power 
(figure  96)  was  only  820  Emol  O2  equiv.  This  is  still 
well  below  the  low  estimate  of  the  amount  required  to 
build  the  crustal  inventory  of  Fe^”*"  (table  2).  As 
suggested  by  Canfield  (1998),  therefore,  it  is  likely 
that  production  of  sulphate  mainly  provided  a  route, 
via  sulphate-reducing  bacteria,  to  delivery  of  excess 
sulphide  to  deep  ocean  waters  and  the  consequent 
cessation  of  deposition  of  banded  iron  formations. 

(h)  1800  Ma-present 

Carbon-isotopic  records  from  the  Neoproterozoic 
onward  are  good  enough  that  values  of  /  can  be 
estimated  with  some  confidence  (Hayes  et  al.  1999). 
The  results  are  crudely  summarized  in  figure  9(2.  When 
these  values  of  /  are  applied  to  the  carbon  fluxes 
estimated  here,  the  projected  final  total  release  of 
oxidizing  power  is  1920  Emol  O2  equiv.  This  is  close  to 
the  average  estimate  of  the  total  release  represented  by 
the  sum  of  crustal  Fe^"*",  SO/“  and  O2. 
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The  agreement  is  notable,  but  discrepancies  within 
the  redox  accounts  are  more  impressive.  Estimates  of 
crustal  inventories  of  oxidants  vary  by  a  factor  of  1.7. 
Similarly,  the  estimated  crustal  inventories  of  organic 
carbon  fall  short  (perhaps  by  two  times)  of  the  summed 
inventories  of  oxidized  products  and  of  the  estimated 
total  burial  of  organic  carbon  (figure  96).  At  face  value, 
this  requires  that  reduced  products  have  been  exported 
from  the  crust  more  efiiciently  than  oxidized  products. 
Unless  organic  carbon  could  be  exported  to  the  mantle 
more  efiiciently  than  Fe^"*",  we  are  practically  required 
to  attribute  such  an  imbalance  to  loss  of  H2  from 
the  top  of  the  atmosphere.  The  redox  imbalance  and 
the  likely  importance  of  H2  loss  have  been  incisively 
examined  and  elaborated  by  Catling  et  al.  (2001)  and 
by  Catling  &  Claire  (2005).  They  associate  loss  of  H2 
mainly  with  atmospheric  CH4  and  thus  with  the 
Archaean.  By  contrast,  Tian  et  al.  (2005)  have  noted 
the  possible  importance  of  the  low  exobase  tempera¬ 
ture  of  an  anoxic  atmosphere  in  limiting  the  rate  of  H2 
escape  from  the  early  Earth.  They  contend  that  higher 
levels  of  O2  in  the  atmosphere  after  2.4  Ga  would  have 
increased  exobase  temperatures  and  promoted  escape 
of  hydrogen.  Resolution  of  the  history  of  hydrogen 
escape  will  constrain  the  second  integral  on  the  right- 
hand  side  of  equation  (2.2),  and  be  a  key  step  in 
reconstructing  the  time  course  of  Aq^. 

The  dearth  of  reduced  products  becomes  particu¬ 
larly  notable  when  the  likely  integrated  effects  of 
subduction,  represented  by  the  third  integral  in 
equation  (2.2),  are  considered.  Lecuyer  &  Rickard 
(1999)  contend  that  the  net  rate  at  which  subduction  is 
transferring  Fe^"*"  from  the  crust  to  the  mantle  is 
presently  7Tmolyr~^,  or  1.8  Tmol  O2  equiv.  yr~\  If 
the  continents  are  at  steady  state  with  respect  to  organic 
carbon,  so  that,  in  equation  (6.2), /7AI+7Mo  =  0,  the 
rate  at  which  the  carbon  cycle  is  releasing  oxidizing 
power  is  given  by  (cf.  equation  (6.2)) 

/[^m-Uad  +^d)]  =  0.24(2.2-1.2) 

=  0.2  Tmol  O2  equiv.  yr“' ,  (6.3) 

where  the  values  substituted  are  centre  points  of 
the  ranges  estimated  in  earlier  sections  of  this  paper. 
Since  the  net  rate  of  subduction  of  Fe^"*"  proposed  by 
Lecuyer  &  Rickard  (1999)  greatly  exceeds  this  value,  it 
is  either  incorrect  or  represents  a  temporary  situation. 
Attention  to  subduction  of  Fe^”*"  is,  however,  a  very 
good  idea.  The  continuing  release  of  oxidizing  power, 
coupled  with  the  minimal  variations  in  atmospheric 
levels  of  O2  during  the  Phanerozoic  (Berner  2004)  and 
balanced  exchange  of  oxidizing  power  by  the  cycles  of 
carbon  and  sulphur  during  the  same  interval  (Canfield 
2004),  requires  a  continuing  sink.  It  seems  inescapable 
that  this  is  supplied  by  oxidation  of  Fe^"*"  at  the 
seafloor. 

Bounds  can  be  placed  on  the  extent  of  this  transfer 
over  Earth  history,  since  iron  returning  to  the  mantle 
with  a  higher  Fe^"'"/TFe  than  it  emerged  with  will  result 
in  oxidation  of  the  upper  mantle.  There  is  a  growing 
consensus  that  the  redox  state  of  the  upper  mantle,  as 
measured  by  its  oxygen  fugacity,  has  not  changed 
greatly  over  Earth  history.  The  recent  constraint  of  Li  & 
Lee  (2004),  from  the  V/Sc  systematics  of  MORBs, 
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indicates  that  the  oxygen  fugacity  of  the  mantle  source 
has  increased  by  at  most  0.3  log  unit  since  3.5  Ga. 
Using  the  relation  of  Kress  &  Carmichael  (1991),  this 
translates  into  a  maximal  6%  increase  in  the  ferric  iron 
content  of  the  mantle,  if  all  of  that  increase  were 
attributable  to  addition  of  Fe^"*".  Taking  the  mass  of  the 
upper  mantle  to  be  1.34X10^^g  (Ballentine  et  al. 
2002),  the  mantle  Fe  content  to  be  6.3  wt%  (Palme  & 
O’Neill  2003)  and  the  mantle  Fe^^/TFe  to  be  0.12 
(Bezos  &  Humler  2005),  this  yields  a  maximal  input  of 
1.1X10^^  mol  of  ferric  iron  since  3.5  Ga,  or  a 
maximum  average  input  rate  of  3  Tmol  Fe^"*"  yr~^ 
(0.75  Tmol  O2  equiv.).  This  is  comfortably  within 
potential  loads  imposed  by 

The  agreement  between  the  projected  release  of 
oxidizing  power  and  the  average  estimate  of  oxidized 
products  is  far  from  comforting.  Because  considerable 
amounts  of  Fe^"*"  must  have  been  exported  to  the 
mantle,  the  total  release  of  oxidizing  power  by  the 
carbon  cycle  should  exceed  considerably  the  inventory 
of  Fe^  remaining  in  the  crust.  Part  of  any  shortfall  can 
be  accommodated  by  recalling  that  the  present 
estimates  are  minima.  Finite  values  of  Jod  and  a  failure 
to  reoxidize  organic  carbon  in  /or  will  increase  Aqx- 
The  reburial  of  organic  carbon  can  be  treated 
quantitatively.  If  a  portion,  x,  of  /or  is  simply  reburied, 
two  things  will  happen:  (i)  since  it  does  not  re-enter  the 
exogenic  system  as  CO2,  that  C  will  be  unavailable  for 
reduction.  The  organic  burial  flux  and  thus  the  overall 
production  of  oxidizing  power  will  be  decreased  by  an 
amount  fxJoj-;  (ii)  the  oxidizing  power  consumed  by 
oxidation  of  /or  will  be  reduced  by  an  amount  xj^or-  The 
first  term  is  a  loss  of  oxidizing  power,  the  second  is  a 
gain.  The  difierence  is  (l“/)xJor-  Unfortunately,  this 
term  cannot  simply  be  added  to  equation  (6.2). 
Although  it  would  account  for  the  carbon  and  redox 
balances  in  a  single  increment  of  time,  the  proper 
treatment  of  subsequent  increments  would  require  that 
the  reburied  material  be  followed,  with  appropriate 
adjustments  being  made  to  sedimentary  and  continen¬ 
tal  inventories.  Even  during  the  Phanerozoic,  the 
presence  of  detrital  coal  and  kerogen  in  marine 
sediments  (Sackett  et  al.  1974)  shows  that  x  is  not 
zero.  On  the  other  hand,  it  is  known  that  micro¬ 
organisms  in  weathering  profiles  incorporate  carbon 
even  from  refractory  kerogens  (Petsch  et  al.  2001)  and 
that  the  great  bulk  of  organic  carbon  in  marine 
sediments  is  not  recycled,  detrital  organic  material. 
Under  anaerobic  conditions,  however,  reburial  of 
organic  carbon  is  likely  to  have  been  more  important. 

Can  the  carbon  cycle  operate  at  a  point  of  redox 
neutrality,  neither  releasing  nor  consuming  oxidants, 
even  when  isotopic  compositions  indicate  significant 
rates  of  burial  of  organic  carbon?  The  answer  is  in 
equation  (6.3).  Even  if  burial  and  recycling  are 
balanced,  redox  neutrality  still  requires 
In  other  words,  inputs  from  the  mantle  are  equal  to 
outputs  to  the  mantle.  This  is  not  a  likely  coincidence. 
Carbon  flows  into  subduction  zones  not  just  from  /gm 
but  from  multiple  sources.  If  the  portion  of  /as+‘/os  + 
/gz  equal  to  is  to  be  returned  to  the  mantle  while  the 
balance  flows  to  /gv  and  /ovj  a  very  clever  demon  will  be 
required  to  operate  the  carbon  valve  in  the  subduction 
zone.  Notably,/ (and  thus  6ab  — <5i)  is  not  a  measure  of 


320 


Carbon  cycle  and  associated  redox  processes  J.  M.  Hayes  &  J.  R.  Waldbauer  947 


‘/am~(‘^ad+^od)-  No  particular  value  of  6ab  can  be 
recognized  as  indicating  redox  neutrality.  This  yields 
two  fundamental  insights.  First,  caution  is  required 
when  interpreting  the  isotopic  record  in  terms  of  redox 
variations.  Second,  the  conditions  required  for  redox 
neutrality  are  precise  and  improbable. 


7.  CONCLUDING  REMARKS 

We  have  taken  a  new  approach  to  mass  and  redox 
balances  involving  the  C  cycle.  Its  principal  strength  is 
a  realistic  acknowledgement  of  the  importance  of 
inputs  from  and  outputs  to  the  mantle.  This  broader 
context  has  invited  consideration  of  processes  that  may 
otherwise  be  overlooked  or  misconstrued.  Its  weakness 
is  that  our  knowledge  of  some  of  the  key  processes  is 
presently  imprecise.  We  need  to  learn  more  about 
inputs  of  carbon,  hydrogen  escape  and  processes  in 
subduction  zones  to  better  evaluate  each  component  of 
.^Ox-  Even  now,  however,  the  resulting  uncertainties  do 
not  cripple  the  approach.  The  key  finding  is  dramatic: 
the  persistence  of  5ab  at  values  near  zero  rather  than 
—  5%o  can  be  interpreted  only  in  terms  of  continuous 
and  substantial  releases  of  oxidizing  power  by  the 
carbon  cycle.  Two  points  follow. 

The  oxidation  of  the  crust  has  been  more  continuous 
than  episodic.  Increases  in  steady-state  levels  of 
oxidants  derive  either  from  combinations  of  geological 
and  biological  changes  or  from  biological  changes 
alone.  The  scientific  problem  of  reconstructing  Earth’s 
oxidation  is  more  biological  than  geochemical. 

Continuous  sinks  for  oxidizing  power  are  required, 
at  least  since  atmospheric  levels  of  O2  stabilized  at 
current  levels  approximately  550  Ma.  Since  exchanges 
of  oxidizing  power  between  the  carbon  and  sulphur 
cycles  have  been  balanced  during  that  same  time 
interval,  this  can  have  been  provided  only  by  the 
oxidation  of  iron. 

Together,  these  affect  our  view  of  the  carbon  cycle.  It 
should  not  be  seen  as  the  competent  and  versatile 
manager  of  an  electron-transfer  market,  effortlessly 
balancing  offers  and  bids.  Instead,  it  is  a  wrong-way 
passenger  on  a  downgoing  escalator.  It  continuously 
receives  new  CO2  from  the  mantle,  relentless  supplies 
of  solar  energy  and  nutrients  leached  from  deposits  that 
have  retained  their  organic  carbon.  To  hold  its  place,  it 
must  constantly  produce  new  organic  matter.  While 
doing  so,  it  releases  oxidants  that  threaten  to  reverse  its 
accomplishments.  The  stability  of  6ab  is  an  indicator  of 
the  success  of  its  evolutionary  strategies.  The  stability 
of  P02  is  evidence  of  an  equally  competent  sink. 

The  seeds  from  which  the  ideas  discussed  have  grown  were 
planted  when  J.M.H.  looked  out  of  Alvin'%  viewports  and 
observed  the  H2-rich  waters  of  the  Lost  City  hydrothermal 
vent  system.  The  generosity  of  Deborah  Kelley  and  Jeffrey 
Karson,  the  co-chief  scientists  of  the  Lost  City  cruise,  is 
greatly  appreciated.  Since  then,  we  have  benefited  from 
helpful  and  stimulating  exchanges  with  Stan  Hart,  Don 
Anderson,  David  Rowley  and  Alberto  Saal.  We  are  very 
grateful  for  critical  reviews  of  this  manuscript  provided  by 
David  Catling  and  Jennifer  Eigenbrode.  J.M.H.  receives 
support  as  a  member  of  the  Astrobiology  team  led  by 
S.  D’Hondt,  University  of  Rhode  Island.  J.R.W.  is  supported 
by  an  NDSEG  Graduate  Fellowship  from  the  Office  of  Naval 
Research. 
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There  is  a  close  connection  between  modern-day  biosynthesis  of  particular  triterpenoid  biomarkers 
and  presence  of  molecular  oxygen  in  the  environment.  Thus,  the  detection  of  steroid  and  triterpenoid 
hydrocarbons  far  back  in  Earth  history  has  been  used  to  infer  the  antiquity  of  oxygenic 
photosynthesis.  This  prompts  the  question:  were  these  compounds  produced  similarly  in  the  past? 
In  this  paper,  we  address  this  question  with  a  review  of  the  current  state  of  knowledge  surrounding  the 
oxygen  requirement  for  steroid  biosynthesis  and  phylogenetic  patterns  in  the  distribution  of  steroid 
and  triterpenoid  biosynthetic  pathways. 

The  hopanoid  and  steroid  biosynthetic  pathways  are  very  highly  conserved  within  the  bacterial  and 
eukaryotic  domains,  respectively.  Bacteriohopanepolyols  are  produced  by  a  wide  range  of  bacteria, 
and  are  methylated  in  significant  abundance  at  the  C2  position  by  oxygen-producing  cyanobacteria. 
On  the  other  hand,  sterol  biosynthesis  is  sparsely  distributed  in  distantly  related  bacterial  taxa  and  the 
pathways  do  not  produce  the  wide  range  of  products  that  characterize  eukaryotes.  In  particular, 
evidence  for  sterol  biosynthesis  by  cyanobacteria  appears  flawed.  Our  experiments  show  that 
cyanobacterial  cultures  are  easily  contaminated  by  sterol-producing  rust  fungi,  which  can  be 
eliminated  by  treatment  with  cycloheximide  affording  sterol-free  samples. 

Sterols  are  ubiquitous  features  of  eukaryotic  membranes,  and  it  appears  likely  that  the  initial  steps 
in  sterol  biosynthesis  were  present  in  their  modern  form  in  the  last  common  ancestor  of  eukaryotes. 
Eleven  molecules  of  O2  are  required  by  four  enzymes  to  produce  one  molecule  of  cholesterol. 
Thermodynamic  arguments,  optimization  of  function  and  parsimony  all  indicate  that  an  ancestral 
anaerobic  pathway  is  highly  unlikely. 

The  known  geological  record  of  molecular  fossils,  especially  steranes  and  triterpanes,  is  notable  for 
the  limited  number  of  structural  motifs  that  have  been  observed.  With  a  few  exceptions,  the  carbon 
skeletons  are  the  same  as  those  found  in  the  lipids  of  extant  organisms  and  no  demonstrably  extinct 
structures  have  been  reported.  Furthermore,  their  patterns  of  occurrence  over  billion  year  time-scales 
correlate  strongly  with  environments  of  deposition.  Accordingly,  biomarkers  are  excellent  indicators 
of  environmental  conditions  even  though  the  taxonomic  affinities  of  all  biomarkers  cannot  be 
precisely  specified.  Biomarkers  are  ultimately  tied  to  biochemicals  with  very  specific  functional 
properties,  and  interpretations  of  the  biomarker  record  will  benefit  from  increased  understanding  of 
the  biological  roles  of  geologically  durable  molecules. 

Keywords:  Archaean;  biomarker  hydrocarbons;  steroids;  sterols;  triterpenoids; 
hopanes  aerobic  biosynthesis 


1.  INTRODUCTION 

(a)  The  advent  of  oxygenic  photosynthesis  and 
an  oxygen-rich  atmosphere 

The  compositional  and  evolutionary  history  of  the 
atmosphere-ocean  system  may  be  reconstructed  from 
the  chemistry  and  habit  of  sedimentary  minerals,  basalt 
weathering  profiles  and  secular  change  in  stable 
isotopes,  along  with  numerical  modelling  and  theo¬ 
retical  considerations  (Cloud  1972).  According  to  the 
paradigm  of  Cloud,  Holland  and  Walker,  oxygen  was  a 
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trace  component  of  the  early  atmosphere  and  rose, 
within  weakly  constrained  bounds,  to  within  one-tenth 
of  the  present  atmospheric  level  (PAL)  by  about  540 
Myr  ago  (Ma)  (Cloud  1972;  Walker  et  al.  1983).  It  is 
further  hypothesized  that  oxygen  played  a  role  in  the 
deposition  of  Archaean  and  Palaeoproterozoic  banded 
iron  formations,  and  that  the  fluxes  of  reduced  minerals 
and  volcanic  gases  into  the  ocean  and  atmosphere  acted 
as  a  buffer  to  keep  atmospheric  oxygen  concentrations 
low  for  a  protracted  period.  An  apparently  ‘sudden’ 
development  of  oxidized  soil  profiles  about  2300  Ma, 
together  with  carbon,  sulphur  and  iron  isotopic 
indicators,  suggest  that  the  oxygen  rise  was  not  uniform 
but  occurred  in  a  stepwise  manner  (Holland  1984; 
Holland  &  Beukes  1990;  Des  Marais  et  al.  1992; 
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Karhu  &  Holland  1996;  Rasmussen  &  Buick  1999;  Des 
Marais  2000;  Farquhar  et  al.  2000a, fc;  Holland  2002; 
Rouxel  et  al.  2005).  The  ‘step’  which  occurred  just 
prior  to  2300  Ma  was  to  greater  than  10~^  PAL  and 
accompanied  by  dramatic  environmental  changes 
indicated  by  large  excursions  in  the  content  of 

marine  carbonates,  and  possible  ‘snowball  Earth’ 
events  (Kirschvink  et  al.  2000;  Bekker  et  al.  2004).  A 
second  very  significant  rise  in  the  Late  Neoproterozoic 
also  probably  took  place  in  stages  punctuated  by 
multiple  ice  ages  with  dramatic  swings  in  carbon  and 
sulphur  isotopes  indicating  a  radical  reorganization  of 
the  biogeochemical  cycles  (Hoffman  &  Schrag  2002; 
Rothman  et  al.  2003;  Halverson  et  al.  2005;  Hurtgen 
et  al.  2005). 

(b)  Isotopic  and  molecular  evidence  pertaining 
to  timing 

Carbon  and  sulphur  isotopes,  molecular  biomarkers, 
and  possibly  iron  isotopes  suggest  that  the  advent  of 
oxygenic  photosynthesis  preceded  the  Early  Palaeo- 
proterozoic  ‘Great  Oxidation  Event’  by  400  Myr  or 
more.  Certain  2-methylhopanoid  biomarkers  that  we 
associate  with  modern-day  oxygen-producing  cyano¬ 
bacteria  and  steroids  that  require  oxygen-utilizing 
enzymes  for  their  biosynthesis  can  be  found  in  rocks 
as  old  as  2715  Myr.  This  is  consistent  with  the 
notion  that  oxygen  production  from  oxygenic  photo¬ 
synthesis  is  indeed  an  ancient  process.  The  inventory 
of  organic  carbon  preserved  in  Middle  to  Late 
Archaean  black  shales  and  carbonates,  indicating 
prolific  primary  productivity  in  diverse  palaeoenvir- 
onments,  is  another  signal  that  would  be  consistent 
with  an  early  appearance  of  oxygenic  photosynthesis 
since  electron  donors  other  than  water  might  have 
been  limiting  in  supply  and/or  spatial  distribution  in 
an  anaerobic  surface  ocean  (Walker  et  al.  1983;  Des 
Marais  2000;  Rosing  &  Frei  2004).  Molecular 
hydrogen,  a  feasible  alternative  electron  donor 
produced  by  sub-sea  basalt  weathering,  was  likely 
abundant  in  the  deep  ocean,  available  for  carbon 
fixation  and  possibly  for  phototrophic  carbon  fixation 
(Sleep  et  al.  2004;  Tice  &  Lowe  2004;  Hayes  & 
Waldbauer  2006).  Sulphur  isotopic  data  indicate  that 
sulphate  reducing  bacteria  were  active  by  2.7  Gyr  ago 
(Ga)  which,  indirectly,  suggest  that  oxygenic  photo¬ 
synthesis  was  extant  (Shen  et  al.  2001;  Rouxel  et  al. 
2005). 

Analyses  of  hydrocarbons  from  Fortescue  and 
Hamersley  group  sediments  of  the  Pilbara  Craton, 
western  Australia  (Brocks  et  al.  1999;  Summons  et  al. 
1999;  Eigenbrode  et  al.  2001,  2004;  Eigenbrode 
2004)  have  uncovered  many  of  the  same  kinds  of 
carbon  skeletons  that  are  prevalent  in  Phanerozoic 
sediments  and  petroleum  (Peters  et  al.  2004).  Of 
particular  interest  are  the  steroid  and  triterpenoid 
carbon  skeletons  that  are  apparently  derived  from 
sterols  and  bacteriohopanepolyols  (BHP),  respect¬ 
ively.  The  presence  of  these  types  of  hydrocarbons  in 
Late  Archaean  sediments,  which  are  considered  by 
many  researchers  to  be  diagnostic  for  Eukarya  and 
Bacteria,  has  been  further  used  as  an  evidence  for 
the  antiquity  of  both  of  these  domains  (Brocks  et  al. 
1999). 
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(c)  Hopanoid  biomarkers 

In  bacteria,  the  functional  forms  of  hopanoids  are 
amphiphilic  BHP.  These  comprise  a  C30  pentacyclic 
triterpene  hydrocarbon  skeleton,  derived  from  squa- 
lene  via  the  enzyme  squalene-hopene  cyclase,  and  are 
linked  via  a  C-C  bond  to  a  C5  sugar  moiety  derived 
from  ribose.  The  polar  moieties  of  BHP  may  be 
augmented  with  sugars,  amino  acids  or  other  function¬ 
alized  units.  These  substituents  on  BHP  evidently  serve 
a  functional  role  and  also  provide  a  chemical  mechan¬ 
ism  for  their  preservation  in  the  geological  record.  They 
facilitate  intermolecular  condensation  reactions,  and 
cross-linking  by  reduced  sulphur  compounds,  that 
result  in  incorporation  into  kerogen  (Wakeham  et  al. 
1995).  The  apolar  ring  system  of  hopanoids  may  be 
modified  by  unsaturation  or  by  addition  of  an  extra 
methyl  group  at  either  position  2  or  position  3  located 
in  the  A-ring.  Although  oxygen  is  not  required  for 
hopanoid  biosynthesis,  the  vast  majority  of  known 
hopanoid  producers  are  aerobic  or  microaerophilic 
bacteria  (Rohmer  et  al.  1984).  These  include  the 
cyanobacteria  and  a-  and  p-proteobacteria.  Notable 
exceptions  include  Geobacter  sp.  (Fischer  et  al.  2005) 
although,  logically,  there  may  be  many  more  obligate  or 
facultative  anaerobes  which  make  BHP,  given  that 
surveys  of  bacterial  taxa  have  been  limited.  Recent 
studies  have  provided  molecular  isotopic  evidence  for 
biosynthesis  of  hopanoids  in  anaerobic  environments 
(Thiel  et  al.  1999,  2003;  Pancost  et  al.  2000). 

Cyanobacteria  are  presently  the  only  known 
bacteria  to  synthesize  abundant  2-methylhopanoids 
with  an  extended  polyhydroxylated  side  chain  (i.e. 
2Me-BHP)  (Rohmer  et  al.  1984;  Summons  et  al. 
1999).  Labelling  experiments  demonstrate  that  the 
2-methyl  substituent  originates  from  L-methionine, 
presumably  via  S'-adenosylmethionine,  with  preser¬ 
vation  of  all  three  hydrogens  and  consistent  with 
methylation  of  a  A^-hopanoid  (Zundel  &  Rohmer 
1985).  Other  details  of  the  biosynthesis  and  the 
specific  functions  of  2Me-BHP  remain  to  be  eluci¬ 
dated.  Cyanobacteria  containing  2Me-BHP  are  dis¬ 
tributed  broadly  throughout  cyanobacterial  phylogeny 
including  Gloeobacter  violaceus  (a  deeply  branching 
cyanobacterium  lacking  thylakoids)  and  the  N2-fixing 
Nostoc  spp.  (a  heterocystous  filament-former).  While 
marine  cyanobacteria  were  poorly  represented  in  the 
initial  survey  (Summons  et  al.  1999),  more  recent 
work  does  not  suggest  that  2Me-BHP  are  widely 
represented  in  those  marine  cyanobacteria  that  have 
been  brought  into  culture. 

(d)  Steroid  biomarkers 

Sterols  are  derived  from  the  same  squalene  precursor  as 
hopanoids  but,  in  marked  contrast  to  BHP,  are  known 
to  have  an  oxygen-dependent  biosynthesis  beginning 
with  the  formation  of  the  first  intermediate,  2,3- 
oxidosqualene.  Enzymes  involved  in  sterol  biosynthesis 
are  highly  specific  in  their  substrate  requirements  and 
the  mechanism  by  which  oxidosqualene  cyclase  (OSC) 
converts  2,3-oxidosqualene  to  either  of  two  protoster¬ 
ols,  lanosterol  or  cycloartenol,  depends  on  precise 
control  of  multiple  intermediates  along  the  cyclization 
cascade  (Abe  et  al.  1993;  Wendt  et  al.  1997,  2000).  The 
unravelling  of  the  intimate  details  of  these  exquisitely 
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stereospecific  reactions,  based  on  more  than  50  years 
concerted  research  since  the  process  was  postulated 
(Woodward  &  Bloch  1953),  is  considered  one  of  the 
classic  accomplishments  of  molecular  science. 

Sterols  are  characteristic  of  all  Eukarya.  They  are 
not  found  in  Archaea  and  the  proven  occurrences  in 
Bacteria  are  sparsely  distributed  and  yield  a  limited 
array  of  products.  Methylococcus  capsulatus,  Gemmata 
obscuriglobus  and  some  members  of  the  myxobacteria 
are  proven  steroid-producing  bacteria  (Bird  et  al.  1971; 
Kohl  et  al.  1983;  Bode  et  al.  2003;  Pearson  et  al.  2003; 
Volkman  2003,  2005). 

There  are  at  least  1 1  original  studies  and  numerous 
reviews  citing  sterol  occurrence  in  cyanobacteria. 
Prominent  among  these,  a  crystalline  mixture  of  sterols 
was  isolated  from  a  culture  of  Phormidium  luridum  and 
identified  by  gas  chromatography-mass  spectrometry 
(GC-MS)  to  contain  C2gA7,  C29A5,  C29A7,22, 
C29A5,7,22  and  C29A5,22  with  minor  amounts  of 
cholesterol  (DeSousa  &  Nes  1968).  When  it  was 
conducted,  this  work  stood  in  marked  contrast  to 
earlier  studies  asserting  the  absence  of  sterols  from 
cyanobacteria  (Levin  &  Bloch  1964)  while  other,  more 
recent  studies  have  failed  to  detect  them  (Rohmer  et  al. 
1979,  1984). 

The  reports  of  cyanobacterial  sterols  apply  to  a 
taxonomically  diverse  range  of  cultured  organisms 
including  Spirulina  platensis,  Nostoc  sp.  and  Calothrix 
sp.  with  C29A5,  C29A7,  C29A5,22  (Paoletti  et  al.  1976), 
Anabaena  sp.  (x3),  Nodularia  sp.  and  Nostoc  sp.  with 
C29A5,  C29A7,  C29A5,22,  C29A5,7,22,  C29:o  and 
cholesterol  (Kohlhase  &  Pohl  1988),  and  Anabaena 
hallensis  with  C29A5,  C28A5,  C29:o  (Hai  et  al.  1996). 
Lanosterol  has  been  identified  in  Chlorogloeopsis  fritschii 
(Sallal  et  al.  1987)  and  there  are  numerous  reports  of 
sterols  in  natural  samples  such  as  cyanobacterial  mats. 

Given  the  variety  of  organisms  investigated,  and  apart 
from  the  C.  fritschii  results,  there  is  a  striking  similarity  in 
the  sterols  identified  as  well  as  their  relative  abundances. 
It  has  been  noted  that  the  strong  predominance  of 
unsaturated  C29  sterols — phytosterols — is  similar  to 
that  found  in  green  algae  and  vascular  plants.  Moreover, 
of  the  more  than  a  dozen  cyanobacterial  genomes 
completed  to  date  (ranging  from  Gloeobacter  to  Nostoc) 
none  contains  genes  encoding  sterol  synthesis  enzymes. 

(e)  Alternative  vievos  about  oxygen  and 
biotnarkers 

An  alternate  hypothesis  concerning  the  history  of 
environmental  oxidation  argues  for  a  relatively  late 
(just  prior  to  2.3  Ga)  evolution  of  cyanobacteria  whose 
oxygen-producing  capability  destroyed  a  methane 
greenhouse  thereby  directly  triggering  the  2. 3-2.2  Ga 
Makganeyene  glaciation.  It  was  recently  proposed 
(Kopp  et  al.  2005)  that  all  the  ‘sudden’  indicators  of 
environmental  oxidation,  such  as  red  bed  appearance 
and  the  large  attenuation  in  mass-independent  sulphur 
isotope  fractionation  around  this  time  period,  record 
the  inception  of  oxygenic  photosynthesis,  and  that 
there  was  essentially  no  time  lag  between  the  origin  of 
organisms  capable  of  oxygenic  photosynthesis  and  their 
rise  to  ecological  dominance  and  impact  on  global 
geochemistry. 
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(f )  The  present  study 

The  validity  of  biomarker  methodologies  for  drawing 
inferences  about  biota  and  ocean  redox  in  the  Archaean 
rests  upon  a  number  of  foundations. 

(i)  The  fossilized  lipids,  hydrocarbons,  reported  in 
Archaean  rocks  must  actually  be  ‘Archaean’  in 
age  and  indigenous  to  the  samples  in  which  they 
were  found. 

(ii)  Membrane  lipid  compositions  and  biosynthetic 
pathways  must  be  features  of  cell  biology  that 
are  conserved  over  evolutionary  time,  such  that 
information  about  past  life  can  be  drawn  from 
knowledge  of  modern  organisms. 

(iii)  The  distribution  of  lipid  biosynthetic  pathways 
should  be  phylogenetically  informative,  such 
that  the  presence  of  particular  molecules  in  the 
rock  record  provides  information  about  con¬ 
temporaneous  biodiversity. 

These  premises  all  deserve  critical  appraisal.  The 
issue  of  syngeneity  has  been  addressed  by  Brocks  et  al. 
(2003)  who  concluded  that  Archaean  biomarkers  were 
‘probably  syngenetic’  with  the  host  rocks.  However, 
they  could  not  totally  exclude  bitumen  migration  from 
younger  sediments  and,  thus,  additional  research  to 
address  this  point  using  freshly  drilled  and  carefully 
curated  drill  cores  is  underway  (Buick  et  al.  2004; 
Summons  et  al.  2004). 

To  address  (ii)  and  (iii)  above,  we  draw  on  the 
inventory  of  protein  sequences  of  key  biosynthetic 
enzymes  for  fresh  insights  about  the  biosynthetic 
oxygen  requirement  for  sterol  synthesis  and  the  status 
of  sterols  as  markers  that  are  specific  for  Eukarya.  We 
also  investigated  the  sterol  contents  of  some  cyano¬ 
bacteria  purported  to  contain  them,  including  their 
capacity  for  biosynthesis.  Lastly,  we  review  studies  of 
the  biosynthesis  of  2-methylhopanoids  and  the  phylo¬ 
genetic  distribution  that  pertains  to  oxygen  availability. 

2.  MATERIAL  AND  METHODS 

(a)  Protein  sequence  and  structure  analysis 

Protein  sequences  and  structures  of  sterol  synthesis  enzymes 
in  various  organisms  were  obtained  from  databases  using  the 
basic  local  alignment  search  tool  (Altschul  et  al.  1997). 
Except  as  noted  below,  sequence  and  structure  data  were 
retrieved  from  databases  searched  through  the  National 
Center  for  Biotechnology  Information  and  Joint  Genome 
Institute  Web  servers.  OSC  sequences  for  Acanthamoeba 
castellanii  and  Hartmannella  vermiformis  were  retrieved  from 
the  Protist  EST  Programme  database  (http://amoebidia.bcm. 
umontreal.ca/public/pepdb/welcome.php);  for  Cyanidioschy- 
zon  merolae  from  the  C.  merolae  Genome  Project  (http:// 
merolae.biol.s.u-tokyo.ac.jp/)  and  for  G.  obscuriglobus  from 
the  Institute  for  Genomic  Research  (http://www.tigr.org/tdb/ 
ufmg/index.shtml).  Sequences  were  aligned  using  ClustalX 
and  alignments  manually  edited  in  GeneDoc,  and  protein 
crystal  structures  were  visualized  with  Cn3D  and  RasMol. 

(b)  Cyanobacterial  lipid  analysis 

(i)  Culture  conditions 

Cyanobacteria  were  grown  within  an  illuminated  incubator 
(12-12  light-dark  cycle)  with  300  ml  of  BG-11  or  D-media 
with  addition  of  filter-sterilized  cycloheximide  (100  rngl”^). 
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Anabaena  cylindrica  (ATCC  27899),  P.  luridum  (UTEX  426), 
Fischerella  sp.  (ATCC  29538),  C.  fristschii  (ATCC  27193), 
Gloeobacter  sp.  TS  and  Gloeocapsa  sp.  were  grown  at  30  °C, 
and  the  Yellowstone  isolates  Phormidium  sp.  RC  and  OSS4  at 
35  °C.  Cultures  were  harvested  by  centrifugation  using  Corex 
centrifuge  bottles,  which  were  solvent-rinsed  with  methylene 
chloride  and  methanol  prior  to  use.  Cell  pellets  were  frozen, 
lyophilized  and  stored  at  —20  °C. 

(ii)  Lipid  analysis 

Total  lipid  was  prepared  from  the  stored  cultures  by  a 
modified  Bligh-Dyer  extraction  (Jahnke  1992)  of  lyophilized 
biomass.  A  300  ml  aliquot  of  BG-11  and  medium  D  were 
processed  in  parallel  as  control  blanks.  After  addition  of  an 
internal  standard  (epiandrosterone  or  cholesterol-D4),  small 
aliquots  {ca  0. 1  mg)  of  the  total  lipid  extracts  were  hydrolysed 
in  1  ml  of  0.1  N  HCl :  methanol  (1  :  1)  at  60  °C  for  2  h.  After 
removal  of  solvent,  and  azeotropic  drying  with  dichloro- 
methane,  the  products  were  derivatized  with  25  pi  each  of 
bis(trimethylsilyl)trifluoroacetamide  and  pyridine  with  heat¬ 
ing  at  70  °C  for  30  min  and  analysed  by  GC-MS.  Sterols  were 
identified  by  comparison  with  literature  spectra  and  the 
spectra  of  authentic  compounds. 

In  a  corollary  set  of  analyses,  lyophilized  biomass  from 
previous  experimental  protocols,  which  had  been  stored  at 
4  °C,  were  extracted  in  a  similar  manner.  PCR  amplification 
of  DNA  from  P.  luridum,  C.  fristschii,  Phormidium  RC  and 
Phormidium  OSS4  using  a  fungal  primer  (ITS-4B)  specific  to 
Basidomycetes  showed  positive  bands.  Laboratory  main¬ 
tained  stock  cultures  of  these  same  cyanobacteria  were 
negative  using  this  primer  set  (C.  Raleigh  &  K.  Cullings 
2000,  personal  communication). 

3.  RESULTS  AND  DISCUSSION 

We  first  discuss  the  sterol  biosynthetic  pathway,  with 
particular  emphasis  on  oxygen  utilization  and  molecu¬ 
lar  evolution  of  the  synthesis  enzymes.  Three  phases  of 
sterol  biosynthesis  are  explored:  the  epoxidation  of 
squalene,  the  cyclization  of  oxidosqualene  to  proto¬ 
sterols  and  modification  of  the  sterol  skeleton,  princi¬ 
pally  by  oxidative  demethylation  (figure  1).  Second,  we 
present  evidence  from  lipid  analyses  of  several  cyano¬ 
bacteria  that  previous  reports  of  sterol  synthesis  by 
these  organisms  may  have  been  compromised  by 
contamination. 

(a)  Squalene  monooxygenase 

(i)  Mechanism  and  oxygen  requirement 
The  epoxidation  of  squalene  is  the  first  oxygen- 
dependent  step  in  the  sterol  pathway,  and  the  point  at 
which  the  synthesis  of  steroids  diverges  from  that  of 
hopanoids.  Early  work  showed  that  sterol  biosynthesis  in 
yeast  does  not  occur  in  fermentative  cells  and  only 
initiates  at  micromolar  levels  of  O2  (Jahnke  &  Klein 
1983).  The  stereospecific  conversion  of  squalene  to 
(35')-2,3-oxidosqualene  is  catalysed  by  the  enzyme 
squalene  monooxygenase  (SQMO;  also  known  as 
squalene  epoxidase)  (figure  2).  SQMO  is  a  flavoprotein 
that  requires  O2  and  flavine-adenine  dinucleotide  (FAD) 


to  effect  oxygenation,  nicotinamide  adenine  dinucleotide 
phosphate,  reduced  form  (NADPH)-cytochrome  P450 
reductase  (itself  a  flavoprotein)  to  regenerate  FAD  and  a 
soluble  protein  factor  for  squalene  transport  (Lee  et  al. 
2004).  Epoxidation  proceeds  by  the  reaction  of  oxygen 
with  the  bound  dihydroflavin  (FlredH2)  to  give  a 
4a-hydroperoxyflavin  (FlH(4a)-OOH).  The  oxygen 
transfer  occurs  by  nucleophilic  attack  by  the  2,3  double 
bond  of  squalene  on  the  terminal  oxygen  of  the 
4a-hydroperoxide,  yielding  oxidized  flavin  (FIqx)  and 
2,3-oxidosqualene  (Bruice  et  al.  1983;  Torres  &  Bruice 
1 999) .  The  FAD  is  regenerated  by  NADPH-cytochrome 
P450  reductase  (Laden  et  al.  2000). 

This  reaction  depends  on  the  electrophilic  character 
of  the  hydroxy  group  of  the  hydroperoxide,  since  the 
attack  comes  from  the  2,3-olefin  of  squalene.  The 
epoxidation  also  benefits  from  the  relatively  weak  0-0 
bond  in  the  hydroperoxide  (47  kcal  mol~  Blanksby  & 
Ellison  2003),  making  the  oxygen  transfer  energetically 
feasible.  It  has  been  suggested  (Raymond  &  Blankenship 
2004)  that  the  squalene  epoxide  oxygen  might  be 
derived  from  a  source  besides  O23  such  as  water.  Such  a 
scheme  would  presumably  proceed  through  hydroxy- 
lation  of  a  cofactor  followed  by  squalene  epoxidation 
using  the  water-derived  hydroxyl.  This  scheme  is 
excluded  on  two  grounds:  first,  such  a  hydroxyl 
would  itself  have  a  nucleophilic  character,  preventing 
attack  by  the  squalene  olefin;  second,  the  C-O  bond  of 
the  hydroxyl  is  much  stronger  (96  kcal  mol~ 
Blanksby  &  Ellison  2003),  providing  a  much  higher 
energy  barrier  to  oxygen  transfer.  Furthermore,  the 
next  enzyme  in  the  pathway,  OSC,  requires  the  35  (and 
rejects  the  3R)  form  of  2,3-oxidosqualene,  and 
hydroperoxide  epoxidation  affords  the  required  stereo¬ 
selectivity.  In  this  chemical  context,  an  02-independent 
route  to  oxidosqualene  is  highly  disfavoured. 

(ii)  Evolutionary  conservation 

SQMO  contains  several  motifs  responsible  for  sub¬ 
strate  and  cofactor  binding  that  are  conserved  in  all 
known  sequences  of  the  protein,  including  those  of 
prokaryotes  (Lee  et  al.  2000,  2004;  Pearson  et  al. 
2003).  All  known  SQMOs  (and  data  are  available  for 
animals,  fungi,  plants,  amoeboid  and  kinetoplastid 
protists,  and  bacteria)  use  the  epoxidation  mechanism 
described  above.  But  might  there  be  alternative, 
chemically  and  evolutionarily  unrelated  methods  to 
produce  oxidosqualene? 

There  are  enzymes  that  catalyse  the  epoxidation  of 
olefins  without  FAD.  These  are  cytochrome  P450 
oxygenases  that  use  Fe  and  O2  and  transfer  oxygen  to  a 
variety  of  imsaturated  substrates,  such  as  arachidonate 
during  eicosanoid  biosynthesis.  As  discussed  below 
with  regard  to  sterol  demethylases  (which  are  P450 
cytochromes),  the  O2  requirement  of  such  enzymes  is 
at  least  as  stringent  as  that  of  the  flavoproteins.  It  is 
noteworthy  that  no  organism  is  known  to  have  replaced 
SQMO  with  a  cytochrome  P450,  though  there  is  no 


Figure  1.  (Opposite.')  Generalized  synthetic  pathway  of  sterols.  Sterol  precursor  squalene  is  oxidatively  converted  to 
oxidosqualene,  which  is  cyclased  to  one  of  two  protosterols:  cycloartenol  or  lanosterol.  The  protosterol  undergoes  subsequent 
modifications  including  oxidative  demethylations  and  desaturations  to  result  in  the  terminal  sterol  product.  Enzymes  are 
labelled  with  EC  number  where  available,  or  gene  abbreviation.  Terminal  sterols  yield  derived  steranes  after  burial  and 
diagenesis.  Enzymes  labelled  in  bold  are  discussed  in  the  text.  Those  requiring  molecular  oxygen  are  noted. 
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Figure  1.  {Caption  opposite.) 
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Figure  2.  Mechanism  for  the  epoxidation  of  squalene  by  squalene  monooxygenase  (SQMO). 


recognized  chemical  or  enzymological  prohibition 
against  this.  In  fact,  a  secondary  squalene  epoxidase 
activity  for  CYPIT,  a  P450  cytochrome  whose  primary 
role  is  the  hydroxylation  of  pregnenolone  and  pro¬ 
gesterone  in  corticosteroid  hormone  synthesis  in 
vertebrates,  has  recently  been  demonstrated  in  mouse 
tumour  cells  (Liu  et  al.  2005).  Were  lipid  biochemistry 
sufficiently  plastic,  one  might  expect  a  gene  replace¬ 
ment  to  have  taken  place,  wherein  the  flavoprotein 
SQMO  would  be  supplanted  by  a  P450  cytochrome 
with  multiple  activities,  thereby  reducing  the  number 
of  enzymes  required  to  constitute  the  complete  path¬ 
way.  In  fact,  such  gene  replacement  is  not  observed. 
The  universal  retention  of  the  FAD-dependent  oxygen¬ 
ase  even  when  other  (though  equally  02-requiring) 
mechanisms  are  available  is  an  example  of  the 
evolutionary  conservatism  of  biosynthetic  pathways. 

(b)  Oxidosqualene  cyclase 

(i)  Structure  and  function  of  oxidosqualene  cyclases 
The  tetracyclic  core  that  characterizes  all  sterols  is 
created  through  the  cyclization  of  squalene  (35)-2,3- 
epoxide  by  the  enzyme  oxidosqualene  cyclase  (OSC). 
This  is  the  second  step  in  sterol  biosynthesis  after  the 
epoxidation  of  squalene,  and  one  of  the  most  complex 
biochemical  reactions  catalysed  by  a  single  enzyme. 
The  cyclization  is  executed  with  a  remarkable  degree  of 
specificity  and  stereochemical  control.  The  products 
are  either  lanosterol  or  cycloartenol  (figure  1),  the  two 
‘protosterols’  that  are  subsequently  modified  to 
functional  products  such  as  cholesterol  or  phytosterols. 

Several  decades  of  work  have  elucidated  much  of  the 
mechanism  of  cyclization  by  OSC  and  identified 
specific  residues  responsible  for  particular  steps  in  the 
cyclization  cascade.  The  detailed  chemistry  of  this 
enzyme  was  recently  reviewed  (Wendt  et  al.  2000; 
Wendt  2005)  and  the  crystal  structure  of  human  OSC 
determined  at  2.1  A  resolution  (Thoma  et  al.  2004) 
allowing  visualization  of  the  enzyme  and  interactions 
with  the  substrate  in  unprecedented  detail.  Figure  3 
shows  two  views  of  the  active  site  of  human  OSC  with 
its  product,  lanosterol.  Briefly,  the  key  steps  in  the 
cyclization  (and  residues  responsible)  include: 
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(i)  Positioning  of  squalene  2,3-epoxide  in  a  pre¬ 
folded  configuration  within  the  active  site  (Y98, 
Y704). 

(ii)  Protonation  of  the  epoxide  group  by  the 
catalytic  acid  (D455,  C4563  C533). 

(iii)  Ring  formation,  during  which  cation  intermedi¬ 
ates  are  stabilized  by  cation-7r  interactions  with 
aromatic  residues  of  the  active  site  (W387, 
F444,  W581,  F696)  and  the  cation  migrates 
out  to  C20. 

(iv)  Skeletal  rearrangement  via  hydride-  and  methyl- 
shifts  (largely  spontaneous)  as  the  proton 
migrates  back  to  a  region  of  high  rr-electron 
density  around  the  B/C  rings. 

(v)  Deprotonation  by  basic  residues  to  quench 
the  final  protosteryl  cation  (Y5033  H232) — the 
position  of  the  deprotonation  determines  the 
OSC  product  (lanosterol  versus  cycloartenol). 
D455  is  ultimately  reprotonated  by  E459. 

The  sequences  of  genes  encoding  OSCs  are  now 
available  for  many  organisms,  principally  animals, 
plants  and  fungi,  but  also  several  protists.  Alignment 
of  these  sequences  shows  very  high  degree  of  conserva¬ 
tion  across  the  breadth  of  eukaryotic  diversity  (at  least 
five  of  the  kingdom-level  divisions  of  Adi  et  al.  (2005)). 
The  active-site  residues  mentioned  above  are  absol¬ 
utely  conserved,  i.e.  100%  identity  at  the  amino  acid 
level.  Alternative  mechanisms  for  oxidosqualene  cycli¬ 
zation  do  not  appear  to  have  arisen  in  any  of  the 
eukaryotic  lineages  sampled  to  date.  Together  with  the 
conservation  of  function  seen  in  the  squalene  epox- 
idases,  this  strongly  suggests  that,  at  least,  the  initial 
steps  in  sterol  biosynthesis  were  present,  generally  in 
their  modern  form,  in  the  last  common  ancestor  of  all 
eukaryotes. 

(ii)  Oxidosqualene  cyclase  product  profiles  and  eukaryote 
phylogeny 

There  are  two  main  types  of  OSCs,  based  on  the  end 
product  of  the  cyclization:  lanosterol  synthases  and 
cycloartenol  synthases.  The  cyclization  process  in  the 
two  types  of  enzymes  is  identical  until  the  final 
deprotonation  step.  A  deprotonation  from  C9  forms 
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the  8,9-double  bond  of  lanosterol  whereas  a  deproto¬ 
nation  from  C19  allows  the  cycloartenol  cyclopropyl 
ring  to  close.  Thus  far,  lanosterol  synthase  has  been 
found  only  among  the  opisthokonts  (animals  +  fungi  + 
choanozoa),  trypanosomatids  (Trypanosoma^  Leishma- 
nia)  and  dinoflagellates  (Giner  et  al.  1991).  All  other 
eukaryotes  that  have  been  examined  in  this  regard  (at 
least  members  of  the  higher  plants,  green  and  red  algae, 
amoebozoa,  diatoms,  euglenids  and  heterolobosea) 
make  cycloartenol  as  their  protosterol. 

Site-directed  mutagenesis  experiments,  notably 
those  of  Matsuda  and  co-workers  (Meyer  et  al.  2000, 
2002^  Joubert  et  al.  200  T  Segura  et  al.  2002^  Lodeiro 
et  al.  2004)  have  indicated  the  key  residues  that  control 
the  product  profile  of  OSCs,  including  T381,  C449 
and  V453.  From  analysis  of  the  crystal  structure,  each 
appears  to  affect  the  position  of  the  catalytic  base  dyad 
H232-Y503.  By  controlling  the  position  from  which 
the  protosterol  cation  is  deprotonated,  these  residues 
determine  which  product  will  be  formed  by  the  cyclase. 
The  second-sphere  residue  C449  is  particularly 
interesting:  the  H449N  mutant  of  Arabidopsis  thaliana 
OSC  is  the  most  efiicient  lanosterol  synthase  to  be 
generated  from  a  wild-type  (WT)  cycloartenol  synthase 
(88%  lanosterol  yield).  Given  its  distance  from  the 
substrate  (approx.  7.4  A),  its  control  of  the  product 
profile  is  likely  indirect.  Interestingly,  mutagenesis 
experiments  have  yet  to  induce  cycloartenol  formation 
from  a  WT  lanosterol  synthase.  Natural  OSCs 
generally  conform  to  the  residue-product  relations 
found  in  mutagenesis  experiments  (figure  3a),  with 
exceptions  that  may  be  evolutionarily  informative. 
From  the  protein  alignment,  patterns  in  differential 
conservation  of  these  product-controlling  residues  can 
be  discerned:  opisthokont  lanosterol  synthases  are 
T38I/C3Q449/V453,  while  cycloartenol  synthases  are 
all  Y381/H449/I453.  Two  exceptions  to  this  pattern 
emerge:  the  lanosterol-producing  OSCs  of  the  bacteria 
M.  capsulatus  and  G.  obscuriglobus  (discussed  further 
below)  and  the  trypanosomatid  lanosterol  synthases. 

The  trypanosomatids  make  lanosterol  as  their 
protosterol,  despite  the  presence  of  a  tyrosine  at 
position  381.  Several  lines  of  evidence  indicate  that 
the  trypanosomatids  ancestrally  possessed  a  cycloarte¬ 
nol  synthase.  First,  the  T381Y  mutation  has  been 
shown  to  decrease  the  efficiency  of  the  Saccharomyces 
cerevisiae  lanosterol  synthase,  and  induce  the  formation 
of  parkeol  and  lanostene-339-diol  (and  not  cycloarte¬ 
nol)  as  secondary  products.  The  fixation  of  a  T381Y 
mutation  in  a  lanosterol  synthase  is  thus  unlikely. 
Second,  euglenids  Euglena  gracilis  (Anding  et  al.  1971) 
and  Astasia  longa  (Rohmer  &  Brandt  1973)  and 
heterolobosea  (Naegleria  sp.,  Raederstorff  &  Rohmer 
1987),  more  deeply  branching  than  the  trypanosoma¬ 
tids  within  the  Excavate  kingdom  (Simpson  et  al. 
2005),  have  been  shown  to  make  sterols  via  the 
cycloartenol  pathway.  Third,  post-cyclization  modifi¬ 
cation  of  sterols  in  the  kinetoplastids  follows  a 
cycloartenol-type  route;  the  14a-demethylase  of 
Trypanosoma  brucei  has  recently  been  shown  to  be 
specific  for  the  cycloartenol  pathway  intermediate 
obtusifoliol  (Lepesheva  et  al.  2004).  Taken  together, 
this  evidence  suggests  the  following  hypotheses:  the 
trypanosomatids  began  with  a  cycloartenol  synthase 
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(Y381/H449/I453)  which  underwent  two  mutations, 
H449Q  and  I453V  (each  requiring  a  single  nucleotide 
change)  to  yield  a  lanosterol  synthase.  Downstream 
modification  of  the  protosterol  remained  essentially  as 
the  cycloartenol  pathway,  but  at  least  some  of  the 
enzymes  must  have  adapted  to  different  substrates;  in 
particular  smtl  and  smol  (figure  1).  It  is  worth  noting 
that  the  Y38 1T/H449Q/I453V  triple  mutant  of 
A.  thaliana  OSC  is  a  reasonably  efficient  lanosterol 
producer;  second-sphere  mutations  may  have  further 
enhanced  the  specificity  of  the  trypanosomatid  OSC. 

Lanosterol  synthesis  appears  to  have  arisen  at  least 
twice  among  the  eukaryotes:  once  in  an  ancestor  of  the 
opisthokonts  and  once  in  an  ancestor  of  the  trypano¬ 
somatids  after  the  divergence  of  the  euglenids. 
Dinoflagellates  have  also  been  reported  to  make 
lanosterol  (Giner  et  al.  1991),  but  no  sequence 
information  is  currently  available  for  their  cyclases; 
they  may  represent  a  third  instance  of  innovation,  or 
may  have  acquired  a  lanosterol  synthase  laterally.  It  is 
as  yet  unclear  which  type,  lanosterol  synthase  or 
cycloartenol  synthase,  was  the  ancestral  form  of  OSC. 
Given  the  hypothesized  independent  originations  of 
lanosterol  synthase  and  the  apparent  difficulty  in 
mutating  a  WT  lanosterol  synthase  to  produce 
cycloartenol,  it  is  tempting  to  infer  cycloartenol 
synthase  as  the  more  ancient  of  the  two  forms.  If,  on 
the  other  hand,  eukaryotic  phylogeny  is  rooted  in  the 
branch  leading  to  the  opisthokonts  (Arisue  et  al.  2005) 
there  is  no  strong  parsimony  argument  either  way. 

(iii)  Prokaryotic  oxidosqualene  cyclases 
While  sterol  synthesis  is  nearly  ubiquitous  among  the 
Eukarya,  only  three  instances  are  known  among 
prokaryotes,  all  bacteria:  M.  capsulatus  (a  y-proteo- 
bacterium),  G.  obscuriglobus  (a  planctomycete)  and  a 
paraphyletic  group  of  myxobacteria  (5-proteobacteria). 
These  organisms  are  not  closely  related  and  the  reason 
for  the  sparse  appearance  of  sterol  synthesis  in 
phylogenetically  distant  bacterial  taxa  remains  unclear. 
If  it  is  the  result  of  vertical  inheritance  from  a  common, 
sterol-synthesizing  bacterial  ancestor,  dozens  of  paral¬ 
lel  losses  of  the  entire  pathway  in  many  lineages  would 
be  required.  Alternatively,  sterol  biosynthesis  genes 
may  have  been  acquired  by  bacteria  via  lateral  transfer 
from  eukaryotes,  potentially  followed  by  one  or  more 
intra-bacterial  transfer  events.  Such  a  eukaryote-to- 
bacteria  gene  transfer  has  been  proposed  to  account  for 
the  similarly  sparse  occurrence  of  glutaminyl-tRNA 
synthetase  among  the  Bacteria  (Lamour  et  al.  1994; 
Brown  &  Doolittle  1999).  At  present,  however,  there  is 
not  sufficient  evidence  to  draw  clear  conclusions 
concerning  the  evolutionary  relationships  between 
eukaryotic  and  prokaryotic  OSCs. 

Of  bacterial  groups,  sterol  synthesis  is  most  widely 
distributed  among  the  myxobacteria.  Of  11  genera  (88 
total  strains)  tested  by  Bode  et  al.  (2003)  only  four  were 
found  to  produce  sterols.  The  sterol-producing  genera 
do  not  form  a  monophyletic  group  in  the  myxobacterial 
phylogeny  of  Shimkets  &  Woese  (1992),  implicating 
some  combination  of  multiple  gains,  multiple  losses 
and  transfer  events  to  explain  the  distribution  of  this 
capability.  The  most  complete  bacterial  sterol  synthesis 
pathway  is  in  the  myxobacterium  Nannocystis  excedens, 
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GGPSPSTtoj 
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YH|8  7TAB 


I6S0--LH0TGPAI0 
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Figure  3.  (a)  Alignment  of  oxidosqualene  cyclase  (OSC)  protein  sequences.  The  catalytic  acid,  D455,  is  highlighted  in  red. 
Positions  381,  449  and  453  are  differentially  conserved  between  lanosterol  synthases  (yellow)  and  cycloartenol  synthases 
(green).  Highly  conserved  residues  that  are  substituted  in  bacterial  OSCs  are  highlighted  in  blue.  Residues  that  are  shown  in  the 
structures  in  (6)  and  (c)  are  indicated  by  asterisks.  Numbering  (throughout  this  figure  and  in  text)  refers  to  Homo  sapiens  OSC. 
{b)  View  of  the  active  site  of  H.  sapiens  OSC  with  lanosterol  shown  in  light  blue.  The  hydrogen-bonding  network  around  the 
catalytic  acid  D455  is  indicated  by  dashed  lines.  H232  and  Y503  constitute  the  catalytic  base  dyad  and  effect  the  final 
deprotonation,  (c)  View  of  the  active  site  in  the  molecular  plane  of  lanosterol.  Note  the  distance  of  second-sphere  residue  C449 
from  the  substrate,  (h)  and  (c)  are  drawn  from  the  crystal  structure  determined  by  Thoma  et  al.  (2004). 


which  can  demethylate  lanosterol  at  C4  and  C14  and 
progress  as  far  as  lathosterol  (figure  1).  Interestingly, 
the  myxobacteria  cyclize  oxidosqualene  to  both  lanos¬ 
terol  and  cycloartenol;  Cystobacter  sp.  produce  both 
protosterols  (Bode  et  al.  2003).  If  OSC  was  laterally 
acquired  by  the  myxobacteria,  it  is  unclear  from  the 
phylogenetic  distribution  alone  which  type  they  initially 
got  from  eukaryotes. 

Three  bacterial  OSC  sequences  are  presently 
available:  the  lanosterol  synthase  of  M.  capsulatus,  the 
OSC  of  G.  obscuriglobus,  which  produces  a  mixture  of 
lanosterol  and  parkeol  and  the  cycloartenol  synthase  of 
the  myxobacterium  Stigmatella  aurantiaca.  Overall, 
these  enzymes  are  quite  similar  to  the  eukaryotic 
cyclases  and  make  use  of  the  same  catalytic  groups  to 
effect  the  same  chemistry.  Of  the  three,  the  Stigmatella 
OSC  is  most  like  eukaryotic  enzymes  (figure  3).  It 
shows  the  standard  Y38 1/H449/I453  pattern  of 
eukaryotic  cycloartenol  synthases,  and  fewer 


differences  among  highly  conserved  residues  than  the 
other  two  known  sequences.  The  M.  capsulatus  and  G. 
obscuriglobus  OSCs  are  another  exception  to  the  pattern 
of  differential  conservation  of  residues  381,  449  and 
453.  These  cyclases  are  Y381/H449A^453,  making  453 
the  only  position  to  be  consistently  differentially 
conserved  between  WT  lanosterol  and  cycloartenol 
synthases.  Both  the  M.  capsulatus  and  G.  obscuriglobus 
OSCs  do,  however,  have  modifications  to  other  active- 
site  residues  (not  yet  explored  in  mutagenesis  exper¬ 
iments)  that  may  contribute  to  their  unusual  product 
profiles.  In  particular,  the  W230Y,  G380R  and  N382 
(V,S)  substitutions  (residues  otherwise  conserved 
across  the  Eukarya  and  in  Stigmatella)  could  all  affect 
the  configuration  of  the  active  site  near  the  deprotonat- 
ing  base,  altering  the  enzyme  product.  Further, 
Gemmata  alone  has  F444L  and  S445G  substitutions; 
these  are  residues  positioned  close  to  T381  and  C449 
and  may  contribute  to  the  Gemmata  OSC’s  high  yield 
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Figure  4.  Steps  in  the  oxidation  of  the  14a  angular  methyl  group  of  sterols  by  the  CYP51  active  site  and  molecular  oxygen.  The 
methyl  group  is  oxidized  three  times  by  the  cycle  (a-h)y  each  using  one  molecule  of  O2.  Note  that  the  Fe  (IV)  +  Por+  complex 
is  written  Fe(V)  for  simplicity.  After  the  third  oxidation  the  methyl  group  is  removed. 


of  parkeolj  a  protosterol  otherwise  very  minor  among 
WT  cyclases. 

(iv)  Oxygen  requirements  of  cyclization 
The  cyclization  of  oxidosqualene  to  the  sterol  hydro¬ 
carbon  skeleton  is  not  oxidative  and  does  not  require  an 
external  electron  acceptor.  OSC  is,  however,  highly 
specific  for  (35’)-2,3-oxidosqualene;  neither  (3i?)-2,3- 
oxidosqualene,  squalene  epoxidized  at  other  positions, 
nor  unoxygenated  squalene  are  suitable  substrates. 
OSC  has  likely  always  acted  on  oxidosqualene,  so  the 
ability  to  epoxidize  squalene  was  a  prerequisite  for 
production  of  the  6,6,6,5-ring  structure.  Indeed,  the 
catalytic  acid  group  of  OSC  (the  conserved  DCxxE 
motif)  may  not  be  acidic  enough  to  protonate  the  olefin 
of  squalene,  but  can  manage  the  epoxide.  That  the 
sterol  cyclization  chemistry  was  present  in  its  modern 
(oxidosqualene-dependent)  form  in  the  eukaryotic  last 
common  ancestor  is  supporting  evidence  that  early 
eukaryotic  life  had  the  means  to  oxygenate  squalene. 

(c)  Sterol  demethylases 

According  to  Bloch’s  (1987)  postulate,  the  sequential 
departure  of  the  three  nuclear  methyl  groups  from  the 
protosterol  lanosterol,  in  the  order  14a-methyl,  4a- 
methyl,  and  4p-methyl,  leads  to  an  improvement  in  the 
fimess  of  the  molecule,  reaching  a  maximum  with 
cholesterol.  While  it  is  now  known  that  the  order  of 
removal  of  the  methyl  groups  does  vary  in  plants. 


compared  to  the  above  order  in  fungi  and  animals, 
there  are  some  lines  of  experimental  evidence  that 
functional  fitness  improves  (Bloch  1983). 

Demethylation  at  the  14a  carbon  is  catalysed  by  an 
oxidative  demethylase  of  the  cytochrome  P450  super¬ 
family  in  plants,  animals  and  fungi,  and  a  few  bacteria. 
Oxidation  of  each  of  the  two  C-4  methyl  groups  is 
carried  out  by  an  unrelated  enzyme  of  the  sterol 
desaturase  family.  The  functions  of  these  enzymes  are 
essential  for  eukaryotes,  perhaps  because  demethyla¬ 
tion  of  a  face  of  sterols  is  required  to  allow  proper 
sterol-fatty  acid  interaction  to  achieve  the  maximum 
membrane  microviscosities  (Dahl  et  al.  1980).  In 
animals,  removal  of  the  14a-methyl  group  is  the  first 
step  in  the  sterol  synthetic  pathway  following  cycliza¬ 
tion,  and  the  CYP51  substrate  is  lanosterol  (figure  1). 
In  filamentous  fungi,  lanosterol  is  methylated  at  C-24 
before  being  demethylated  at  C-14  and  the  CYP51 
substrate  is  eburicol  (24-methylene-24,25-dihydrola- 
nosterol).  Several  steps  occur  along  the  cycloartenol 
pathway  in  plants  before  14a-demethylation,  including 
one  demethylation  at  C-4  and  the  opening  of  the 
cyclopropyl  ring  so  that  plant  CYP51  operates  on 
obtusifoliol. 

(i)  C-14  demethylases 

The  removal  of  the  1 4a-methyl  group  of  sterols  (figure  4) 
is  performed  by  sterol  14a-demethylase  (CYP51). 
CYP51  belongs  to  the  cytochrome  P450  enzyme 
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superfamily,  which  includes  more  than  2000  members  in 
all  three  domains  of  life.  All  P450s  are  used  in  oxidative 
reactions  on  various  molecules  and  all  require  molecular 
oxygen  as  a  substrate.  The  active  site  of  P450s  contains  a 
Fe-protoporphyrin  IX  (haem).  CYP51  is  the  only 
cytochrome  P450  that  performs  the  same  function  in 
different  biological  domains.  Some  researchers  have 
suggested  that  CYP51  is  the  ancestral  P450  (Nelson  & 
Strobel  1989;  Aoyama  et  al.  1994;  Rezen  et  al.  2004). 
CYP5 1  operates  on  one  of  four  substrates — lanosterol 
(in  animals  and  fungi),  dihydrolanosterol  (animals), 
obtusifoliol  (plants  and  kinetoplastids),  or  eburicol 
(fungi).  These  substrates  differ  only  in  the  nature  of  the 
side  chain  and  the  presence  or  absence  of  a  second  methyl 
group  at  C-4. 

The  small  differences  between  substrates  probably 
account  for  a  general  lack  of  substrate  specificity  among 
the  demethylases.  Most  CYP51s  can  demethylate  any 
of  the  four  substrates,  although  a  few  plant  CYP51s  are 
obtusifoliol-specific.  In  humans,  a  defect  in  CYP51 
causes  Antley-Bixler  syndrome.  In  yeast,  14a- 
demethylase  inhibition  is  fatal,  and  this  makes  the 
enzyme  an  attractive  target  for  fungicides.  Fungicides 
may  take  advantage  of  the  differences  between  fungal 
and  animal  CYP51  active  sites  to  selectively  inhibit  the 
fungal  enzyme. 

Rezen  et  al.  (2004)  found  that  CYP51s  separate 
phylogenetically  into  plant  (obtusifoliol),  animal 
(lanosterol)  and  fungal  (eburicol)  groups.  Bacterial 
CYP51s  including  those  of  M.  capsulatus  and  Myco¬ 
bacterium  tuberculosis  both  fall  within  the  plant  lineage. 
The  purified  M.  tuberculosis  protein  will  demethylate 
lanosterol  in  vitro  although  obtusifoliol  is  preferred. 
However,  despite  the  phylogenetic  placement  of  its 
CYP51,  M.  capsulatus  synthesizes  lanosterol.  It  has  yet 
to  be  demonstrated  whether  lanosterol  is  the  substrate 
for  M.  capsulatus  CYP51.  Jackson  et  al.  (2002)  note 
that  the  M.  capsulatus  CYP51  is  novel  in  that  it  is  linked 
to  a  ferredoxin  domain,  but  that  the  CYP51  in 
M.  tuberculosis  is  part  of  an  operon  in  which  it  is 
followed  by  ferredoxin.  The  M.  capsulatus  gene  may  be 
the  result  of  a  lateral  transfer  event  from  M.  tuberculosis 
followed  by  a  mutation.  Perhaps  the  best  example  of 
the  broad  substrate  specificity  of  these  enzymes  is 
Streptomyces  coelicolor,  which  contains  a  gene  with  low- 
level  homology  to  bacterial  CYP51s  and  that  demethy¬ 
lates  eburicol  but  not  lanosterol  (Lamb  et  al.  2003). 
However,  this  gene  is  not  a  CYP51,  and  the  conserved 
sites  differ  significantly  from  CYP51.  Streptomyces 
coelicolor  does  not  contain  sterols,  and  the  in  vivo 
function  of  this  protein  is  unknown. 

The  particulars  of  amino  acid  residue  participation 
in  substrate  binding  and  catalysis  of  CYP51  are 
unknown.  Podust  and  co-workers  (Podust  et  al. 
200  li)  noted  that  two  channels  with  access  to  the 
haem  may  serve  to  transport  substrate  and  product  to 
and  from  the  active  site .  They  (Podust  etal.  2001  a)  also 
performed  modelling  experiments  based  on  the  crystal 
structure  of  M.  tuberculosis  CYP5 1  in  the  presence  of 
the  azole  inhibitors  fiuconazole  and  4-phenylimidazole. 
They  found  that  the  inhibitors  coordinate  themselves 
with  the  haem  iron  in  the  large  (2600  A^)  cavity 
opposite  the  cysteine  (C394)  that  binds  the  haem  to 
the  protein.  Amino  acids  surrounding  this  cavity  were 
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considered  likely  to  be  involved  in  binding  and/or 
catalysis  and  showed  a  high  degree  of  conservation 
across  CYP51s  from  many  organisms.  Only  approxi¬ 
mately  10%  (41  conserved  residues)  of  CYP51s  are 
absolutely  conserved  across  all  domains  of  life  (with  the 
exception  of  five  residues  in  the  kinetoplastids)  and 
most  of  these  are  not  at  the  active  site.  Half  of  these 
sites  have  been  examined  by  site-directed  mutagenesis 
experiments,  which  have  shown  that  most  of  them  are 
essential  for  CYP51  function.  Naturally  occurring 
azole-resistant  strains  of  Candida  albicans  contain  the 
mutation  Y132H,  which  does  not  directly  interact  with 
substrate  and  may  reflect  a  more  complicated  inter¬ 
action  between  protein  and  substrate,  and  Bellamine 
et  al.  (2004)  report  from  several  site-directed  mutagen¬ 
esis  experiments  that  azole  and  substrate  binding  are 
uncoupled.  Among  the  residues  identified  by  Podust 
et  al.  (2001a,£>)  as  participating  in  the  active  site,  there 
are  several  positions  which  differ  between  CYP51 
subfamilies  operating  on  different  substrates. 

Other  site-directed  mutagenesis  studies  on  10 
residues  that  were  believed  to  be  strictly  conserved 
among  CYP51s  (Lepesheva  &  Waterman  2004)  found 
that  most  mutants  lost  all  demethylase  activity, 
although  one  (A197G)  showed  enhanced  demethylase 
activity.  Based  on  a  combination  of  mutagenesis 
experiments  and  the  crystal  structure  of  CYP51  they 
postulated  potential  substrate  binding  sites:  D90  as  a 
binding  site  for  the  sterol  3[3-OH,  LI  72  and  R194 
associating  with  the  side  chain,  and  F82  binding  the 
sterol  A  or  B  ring.  Kinetoplastid  genomes  revealed 
differences  from  the  ‘strictly  conserved’  at  five  of  the 
CYP51  residues  in  which  mutations  produced  total  or 
near  total  loss  of  function  in  M.  tuberculosis  (D90A3 
L127M,  G175S/C,  R194C,  D195H/R).  Lepesheva 
et  al.  (2004)  showed  that  the  kinetoplastid  CYP51s 
were  obtusifoliol-specific  and  that  mutation  of  these 
five  residues  back  to  the  conserved  state  did  not 
improve  the  ability  of  the  kinetoplastid  enzyme  to 
metabolize  lanosterol.  A  clearer  understanding  of  the 
chemical  role  of  these  conserved  residues  will  be 
necessary  to  discern  the  evolutionary  implications  of 
these  changes. 

The  general  mechanism  of  substrate  oxidation  by 
P450  enzymes  is  known  from  several  decades  of  work 
on  the  camphor-oxidizing  P450  oi Pseudomonas  putida. 
It  consists  of  eight  steps  (Groves  &  Han  1995;  Meunier 
et  al.  2004): 

(i)  substrate  binding  and  subsequent  displace¬ 
ment  of  the  haem  Fe(III); 

(ii)  electron  transfer  to  Fe(III)  from  a  reductase 
cofactor  to  yield  Fe(II)  and  a  haem  with  a 
negative  charge; 

(iii)  binding  of  molecular  oxygen  to  the  ferrous  iron 
yielding  a  Fe(III)-dioxygen  bond; 

(iv)  transfer  of  a  second  electron  to  the  haem  to 
yield  a  negatively  charged  Fe(III)-peroxo 
complex  which  is  a  very  strong  nucleophile; 

(v)  protonation  of  the  Fe(III)-peroxo  complex 
yielding  a  P450-Fe(III)-OOH  which  also 
behaves  as  a  nucleophile.  This  protonation 
likely  involves  the  action  of  an  acidic  residue 
(D251  in  P450cam)  near  the  active  site; 


336 


Steroids,  triterpenoids  and  molecular  oxygen  R.  E.  Summons  and  others  961 


(vi)  a  second  protonation  of  the  distal  oxygen  atom 
and  cleavage  of  the  0-0  bond,  yielding  a 
molecule  of  water  and  the  formation  of  the 
reactive  electrophilic  iron-oxo  species  Fe(IV); 

(vii)  transfer  of  the  oxygen  atom  from  the  iron-oxo 
complex  to  the  substrate; 

(viii)  dissociation  of  the  product. 

In  CYP51,  this  process  operates  three  times  on  the 
14a-methyl  group  of  the  sterol  which  successively 
converts  the  14a-methyl  group  to  14a-hydroxymethyl, 
14a-carboaldehyde  and  14a-formyl  intermediates, 
subsequently  eliminating  formic  acid  and  leaving  a 
A14,15  double  bond  in  the  sterol.  It  is  significant  that 
the  initial  hydroxylation  of  the  methyl,  which  requires 
the  abstraction  of  H',  is  achieved  with  the  high  redox 
potential  associated  with  the  Fe(IV)  complex,  which  is 
achieved  only  through  the  action  of  molecular  oxygen. 
This  complex  has  an  effective  oxidation  state  of  Fe(V) 
due  to  the  additional  charge  on  the  porphyrin.  See 
Meunier  et  al.  (2004)  for  a  detalied  review. 

(ii)  C-4  demethylases 

Demethylation  of  a  C-4  methyl  group  requires  the 
action  of  a  suite  of  three  enzymes  working  sequentially 
(Gachotte  et  al.  1998,  1999;  Benveniste  2004): 

(i)  C-4a-methyl  oxidase  (ERG25  in  yeast  and 
smol/smo2  in  plants),  which  operates  on  the 
4a-methyl  carbon  three  times  with  molecular 
oxygen  to  produce  a  4a-carboxylic  acid  (Darnet 
&  Rahier  2003,  2004); 

(ii)  4a-carboxysterol-C-4-dehydrogenase/C-4-dec- 
arboxylase  (4a-CD  or  ERG26  in  yeast)  which 
decarboxylates  the  4a-acid  and  produces  a 
3-oxosteroid  (Gachotte  et  al.  1998;  Rondet 
et  al.  1999); 

(iii)  NADPH-dependent  sterone  reductase 
(ERG27)  which  stereospecifically  reduces  the 
3-oxosteroid  to  a  3p-OH. 

These  enzymes  always  act  on  the  4a-methyl  group. 
In  the  first  demethylation,  the  4a-methyl  is  removed, 
and  the  4p-methyl  group  rearranges  to  the  4a  position. 
In  both  animals  and  fungi,  the  two  methyl  groups  are 
sequentially  removed  following  14a-demethylation. 
The  C-4  demethylase  enzymatic  suite  converts  4,4- 
dimethyl-5a-cholesta-8, 1 4,24-3  p-ol  to  4a-methyl 
zymosterol,  which  is  subsequently  converted  by  the 
same  suite  of  enzymes  to  zymosterol  (Benveniste 
2004).  In  plants,  the  first  methyl  group  is  removed  at 
the  level  of  24-methylene  cycloartenol,  which  is 
converted  to  cycloeucalenol.  This  pathway  goes 
through  several  subsequent  steps,  including  the 
removal  of  the  14a-methyl  group  before  being 
demethylated  at  C-4  a  second  time  at  the  level  of  24- 
ethylenelophenol,  which  is  converted  to  24-ethylene- 
lathosterol.  Plant  genomes  show  the  presence  of  two 
distinct  sterol  methyl  oxidases  (smol  and  smo2) 
(Darnet  &  Rahier  2004),  and  gene  silencing  exper¬ 
iments  in  A.  thaliana  have  demonstrated  that  each  of 
these  operate  with  high  substrate  specificity — smol  on 
24-methylene  cycloartenol  and  smo2  on  24-ethylene- 
lophenol  (Benveniste  2004). 

Phil.  Trans.  R.  Soc.  B  (2006) 


Other  taxa  which  synthesize  C-4  desmethyl  sterols 
include  the  bacteria  M.  capsulatus  and  N.  excedens,  and 
the  kinetoplastids.  Genome  sequences  of  Al.  capsulatus 
and  three  kinetoplastid  species  are  available  in 
GenBank  and  a  comparison  of  the  sequenced  genome 
to  known  sequences  for  erg25,  smol  and  smo2  revealed 
no  significant  homologues  to  any  of  these  genes  among 
these  organisms.  It  is  possible  that  these  organisms  are 
using  an  alternative  enzyme  for  C-4  sterol 
demethylation. 

(iii)  Energetic  constraints  and  oxygen  usage 
The  mechanism  of  sterol  demethylation  functionalizes 
a  methyl  group  attached  to  a  quaternary  carbon  by 
attaching  an  oxygen  atom.  CYP51  achieves  this  by 
the  homolytic  cleavage  of  a  C-H  bond  to  create  a 
methyl  radical,  which  subsequently  reacts  with  the 
oxygen  distally  attached  to  the  haem  of  the  P450 
(Meunier  et  al.  2004) .  This  is  an  energetically  expensive 
process  which  is  overcome  in  part  by  using  the  most 
powerful  oxidizing  agent  available:  molecular  O2. 

The  C-H  bond  dissociation  energy  (D)  of  a  methyl 
group  attached  to  a  quaternary  carbon  is  401  kj  mol~^ 
(March  1992).  However,  bond  energies  of  the  tran¬ 
sition  state  in  the  demethylase  enzymes  are  unknown 
and  so  this  does  not  indicate  actual  activation  energy  of 
the  methyl  group,  but  gives  an  indication  of  the  stability 
of  the  radical.  This  dissociation  energy  is  higher 
(indicating  a  less  stable  radical)  than  that  of  secondary 
(401kjmol“^)  and  tertiary  (401kjmol“^)  carbons, 
but  not  as  high  as  that  of  the  C-H  bond  in  primary 
(419kJmol~^)  carbons,  methane  (438kJmol~^)  or 
benzene  (464  kJ  mol~^). 

Microbial  degradation  of  both  benzene  and  methane 
takes  place  readily  in  the  presence  of  oxygen,  but  is  also 
possible  under  many  conditions  as  mildly  exergonic 
processes  with  sulphate  as  a  terminal  electron  acceptor. 
This  indicates  that  abstraction  of  C-H  bonds  stronger 
than  those  in  sterol  methyl  groups  is  possible  without 
molecular  oxygen.  Although  thermodynamically  feas¬ 
ible,  an  enzyme  that  demethylated  sterols  anaerobically 
would  be  energetically  much  more  expensive. 

Sterol  demethylases  have  evolved  at  least  twice,  and 
in  each  case  require  three  molecules  of  molecular 
oxygen  to  catalyse  the  reaction.  Constructed  phyloge¬ 
netic  trees  of  CYP51  cluster  those  demethylases  into 
groups  that  parallel  the  families  of  sterol  cyclases, 
suggesting  that  oxidative  sterol  demethylation  is  an 
ancient  and  conserved  pathway  that  has  existed  at  least 
since  the  time  of  the  split  between  plants,  kinetoplastids 
and  opisthokonts  (Rezen  et  al.  2004).  Nature  may 
contain  at  least  one  undescribed  sterol  demethylase 
(the  C-4  demethylase  in  AT.  capsulatus  and  kinetoplas¬ 
tids).  As  this  undescribed  enzyme  occurs  in  obligate 
aerobes,  it  is  likely  to  require  molecular  oxygen. 

On  strict  chemical  grounds  it  may  be  possible  to 
devise  biosynthetic  routes  to  sterols  that  proceed 
anaerobically  as  has  been  recently  proposed  (Raymond 
&  Blankenship  2004).  However,  any  postulated 
anaerobic  pathway  for  sterol  synthesis  must  replace 
five  enzymes  which  use  a  combined  1 1  or  12  molecules 
of  O2  with  anaerobic  enzymes  capable  of  performing 
the  equivalent  process,  and  further  postulate  that  all  of 
these  enzymes  have  been  lost  or  are  unknown.  Any 
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Figure  5.  Total  ion  chromatograms  for  showing  sterols  in  the 
total  lipid  extracts  of  four  of  the  investigated  cyanobacterial 
cultures  {a~d).  Phormidium  sp.  RC04  and  OSS4  are  isolates 
from  Yellowstone  National  Park.  All  organisms  were  grown 
for  a  biomarker  and  isotopic  investigation  of  mat-forming 
communities  (Jahnke  et  al.  2004).  RI,  relative  intensity;  I.S., 
internal  standard. 

hypothesis  that  proposes  enzymes  existed  in  the  past, 
and  for  which  all  evidence  has  been  lost,  is  not  testable. 


(d)  On  the  occurrence  of  sterols  in  cyanobacteria 
In  connection  with  earlier  studies  of  2-methylhopa- 
noids  and  other  biomarkers  in  strains  of  cultured 
cyanobacteria  (Summons  et  al.  1999;  Jahnke  et  al. 
2004),  we  checked  for  the  presence  of  sterols  in  the 
total  lipid  extracts  of  some  of  the  genera  previously 
reported  to  contain  them.  Samples  were  hydrolysed 
with  acid  in  order  to  render  conjugated  sterols  in  the 
free  form  and  then  converted  to  trimethylsilyl  deriva¬ 
tives  for  GC-MS.  As  figure  5  shows,  our  cultured 
samples  contained  an  abundance  of  sterols.  Moreover, 
the  distributions  were  similar  in  all  samples  with  many 
of  the  same  compounds  reported  by  previous  workers 
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Figure  6.  Gas  chromatography-mass  spectrometry  data, 
depicted  as  the  miz  129  ion  which  is  diagnostic  for 
trimethylsilyl  (TMS)  sterols,  for  some  cyanobacteria  cultured 
in  the  presence  of  cycloheximide  {a-d).  The  only  detectable 
sterol  is  ergosterol  which  was  also  present  in  blank  analyses  of 
the  BG-11  culture  medium  and  attributable  to  that  source. 


including  a  strong  predominance  of  C29A5,  C29A5322 
and  C29  A5324(28)  (figure  5). 

These  results  prompted  us  to  examine  other  options, 
one  of  which  was  to  re-culture  the  organisms  in  media 
with  defined  sterol  contents  and  in  the  presence  of 
cycloheximide,  a  compound  known  to  inhibit  the 
growth  of  eukaryotes  by  blocking  protein  synthesis. 
After  several  sub-cultures  in  the  presence  of  cyclohex¬ 
imide  we  could  only  identify  traces  of  a  C27  and  a  C28 
sterol  that  were  subsequently  found  to  be  components 
of  the  BG-1 1  medium  (figure  6).  After  re-culturing  of 
these  cyanobacteria  in  the  original  media,  and  in  the 
absence  of  cycloheximide,  they  continued  to  be  free  of 
detectable  sterols. 

The  results  of  these  experiments  indicated  that  our 
original  cultures  were  contaminated  by  an  organism 
that  produced  C29  sterols  in  abundance.  The  source 
of  contamination  was  investigated  using,  firstly, 
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a  universal  gene  probe  for  eukaryotes  which  was 
positive  and,  secondly,  a  probe  specific  for  Basidomy- 
cetes  (rust  fungi)  which  was  also  positive.  Unlike  other 
fungi  which  produce  ergosterol  (C28A5)  or  cholesterol 
(C27A5)  as  their  major  sterols,  Uredospores  such  as 
Uromyces  phaseoli  (bean  rust),  fiax  rust  and  Cronartium 
fusiforme  (fusiform  rust)  all  produce  C29  sterols  like 
their  host  plants.  In  these  organisms,  the  principal 
sterols  are  C29A7  and  C29A7,24(28)  (Jackson  &  Frear 
1968,  Lin  et  al.  1972;  Carmack  et  al.  1976). 

Lack  of  sterols  in  cyanobacteria  is  further  evidenced 
by  their  genome  sequences,  which  reveal  that  the  only 
genes  significantly  homologous  to  sterol  synthases  are 
squalene-hopene  synthases.  As  microbial  genome 
sequencing  progresses,  diverse  new  sterol  producers 
may  be  discovered  on  the  basis  of  gene  content;  the 
utility  of  this  approach  has  already  been  demonstrated 
(Pearson  et  al.  2003) .  Such  searches  will  be  aided  by  the 
high  degree  of  conservation  among  sterol  synthesis 
genes,  which  makes  them  readily  recognizable  on  the 
basis  of  sequence  similarity.  It  is  worth  noting  that, 
among  the  258  prokaryotic  genomes  sequenced  and 
available  as  of  this  writing,  only  one  previously 
unknown  sterol  producer,  G.  obscuriglobus  (Pearson 
et  al.  2003),  has  been  discovered. 


4.  RELATING  BIOMARKERS  TO  BIOLOGICAL 
AND  GEOCHEMICAL  EVOLUTION 
(a)  Membrane  function  of  sterols  and 
evolutionary  and  ecological  adaptation 
The  function  of  sterols  in  cellular  membranes  has  been 
a  topic  of  long-standing  interest  among  biochemists 
and  cell  biologists.  Fifty  years  after  the  central  steps  in 
the  sterol  synthesis  pathway  were  elucidated  by  Bloch 
and  co-workers,  understanding  of  the  structural  and 
functional  roles  of  these  cardinal  eukaryotic  lipids 
remains  incomplete  and  an  active  area  of  research.  It 
was  recognized  early  on  that  sterols  modulate  the 
micro-scale  fiuidic  properties  of  the  membrane — its 
density,  viscosity,  and  so  on — and  that  sterols  that  differ 
by  only  the  addition  of  a  methyl  group  or  a  double  bond 
can  produce  measurably  different  effects.  The  principal 
mechanism  for  this  structural  effect  of  sterols  is  their 
induction  of  a  liquid-ordered  phase  in  membranes,  a 
state  intermediate  between  high-temperature  liquid- 
disordered  and  low  temperature  solid-ordered  (gel) 
phases.  Further,  the  liquid-ordered  and  liquid-dis¬ 
ordered  phases  can  coexist  in  the  same  membrane, 
allowing  for  spatial  heterogeneity  and  the  notion  of  the 
membrane  as  a  ‘fluid  mosaic’  with  discrete  domains 
(lipid  rafts)  (Simons  &  Vaz  2004).  A  functional  role  for 
sterols  has  been  suggested  in  the  reduction  of 
permeability  of  membranes  to  cations,  particularly 
protons  and  sodium,  hence  assisting  in  energy  con¬ 
servation  (Haines  2001).  This  is  likely  due  to  enhanced 
exclusion  of  water  clusters  or  chains  from  the 
membrane,  though  the  precise  mechanism  remains 
under  investigation  (Tepper  &  Voth  2005). 

The  foregoing  discussion  of  the  structural  and 
functional  characteristics  of  steroids  could  apply 
essentially  equally  well  to  hopanoids,  which  are 
membrane  constituents  of  many  bacteria.  The  two 
classes  of  lipids  share  the  basic  polycyclic  skeleton  side 
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chain  structure  and  are  of  nearly  identical  molecular 
dimensions  (approx.  8  A  by  19  A).  Hopanoids  have 
been  demonstrated  to  influence  membrane  ordering 
and  fluid  properties  similarly  to  steroids  (Kannenberg 
et  al.  1983).  On  the  basis  of  such  comparisons, 
hopanoids  have  been  termed  ‘sterol  surrogates’  and 
‘functionally  equivalent’  in  membranes  (Ourisson  et  al. 
1987).  This  begs  the  question:  why  should  functionally 
equivalent  molecules  be  so  strongly  differentially 
conserved  across  the  breadth  of  the  diversity  of  cellular 
life? 

While  steroids  and  hopanoids  are  structurally 
similar,  they  differ  in  a  key  respect:  steroids  have  their 
polar  group  attached  directly  to  the  ring  structure  at 
C3,  while  the  polar  functions  in  hopanoids  are  attached 
to  the  side  chain.  As  a  result,  the  ring  structure  of 
steroids  sits  near  the  edge  of  the  lipid  bilayer,  but  that  of 
hopanoids  is  nearer  the  centre.  This  suggests  that  the 
two  types  of  terpenoid  may  move  quite  differently  in 
membranes,  particularly  with  regard  to  their  ability  to 
translocate  from  one  leaflet  of  the  bilayer  to  the  other,  a 
phenomenon  known  as  ‘lipid  flip-flop’. 

Flip-flop  is  an  important  property  of  membrane 
lipids  because  in  order  for  a  membrane  to  deform — i.e. 
to  curve  inward  or  outward — one  leaflet  of  the  bilayer 
must  become  longer  while  the  other  becomes  shorter. 
This  curvature  is  effected  by  flipping  lipid  molecules 
from  one  leaflet  to  the  other.  Sterols  have  among  the 
shortest  ti/2  values  for  transbilayer  flip-flop  of  any 
membrane  lipid  (Holthuis  &  Levine  2005),  meaning 
that  steroid-containing  membranes  are  readily 
deformed.  This  was  elegantly  demonstrated  (Bacia 
et  al.  2005)  in  a  cell-free  system  of  giant  unilamellar 
vesicles,  where  it  was  found  that  not  only  does  the 
addition  of  sterols  to  lipid  vesicles  induce  domain 
formation  and  budding,  but  the  type  of  sterol  controls 
the  direction  of  curvature.  Cholesterol  and  lophenol 
induce  positive  (outward)  curvature,  while  lanosterol 
and  cholesteryl  sulphonate  cause  inward  (negative) 
budding.  Varying  the  proportions  of  a  mixture  of 
cholesterol  and  cholesteryl  sulphonate  controlled  both 
domain  size  and  budding  behaviour.  Eukaryotic  cells 
have  highly  specific  structural  requirements  for  their 
membrane  sterols:  changing  even  the  position  of 
unsaturation  in  the  ring  system  or  the  stereochemistry 
of  the  hydroxyl  group  attachment  can  result  in  an 
incompetent  cell  envelope  (Xu  et  al.  2005).  Taken 
together,  this  evidence  suggests  that  a  membrane  with  a 
well-regulated  sterol  composition  is  a  powerful  tool  for 
export  and  import  across  the  cell  membrane. 

Eukaryotes,  both  unicellular  and  multicellular, 
make  extensive  use  of  endo-  and  exocytosis.  The 
innovation  of  sterol  biosynthesis,  in  allowing  rapid 
membrane  deformation,  may  have  been  a  key  step  in 
the  evolution  of  these  processes.  In  eukaryotes, 
phagocytosis  and  membrane  biogenesis  are  closely 
coupled.  When  part  of  a  membrane  is  drawn  in  to 
engulf  a  particle,  lipid  synthesis  is  stimulated  (through 
the  sterol  regulatory  element  binding  protein  transcrip¬ 
tional  regulators)  to  just  the  degree  to  replace  the 
consumed  membrane  segment  (Castoreno  etal.  2005). 
With  a  flexible,  deformable  membrane,  many  mech¬ 
anisms  to  generate  curvature  are  possible,  including 
protein  scaffolding,  helix  insertion  and  active 
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cytoskeletal  remodelling  (McMahon  &  Gallop  2005). 
Much  of  the  dynamic  character  of  the  eukaryotic 
membrane  system  can  be  attributed  to  these  curvature 
mechanisms.  Though  experimental  quantification  of 
the  transbilayer  movement  of  hopanoids  (particularly 
their  flip-fiop  half-time)  is  lacking,  the  absence  of  endo- 
and  exocytosis  among  the  bacteria  may  indicate  that 
hopanoids  are  not  functionally  equivalent  to  sterols  in 
this  regard. 

The  ability  to  perform  this  type  of  transmembrane 
transport  had  profound  evolutionary  and  ecological 
consequences.  In  essence,  the  invention  of  endocytosis 
is  the  dawn  of  predation.  Prior  to  endocytosis, 
heterotrophy  proceeded  largely  through  the  dissolved 
phase,  and  no  stratified  trophic  relationships  existed. 
Once  large  particles  (including  other  cells)  could  be 
imported  and  enzymatically  degraded  intracellularly, 
the  predator-prey  dynamic,  that  shaped  much  of 
evolutionary  history,  was  established. 

(i)  Possible  role  of  O 2  in  the  biosynthesis 
of  2-methylbacteriohopanepolyols 

Precise  details  of  the  biosynthetic  pathway  leading  to 
2-methylbacteriohopanepolyols  (2Me-BHP)  are  not 
known  although  the  methyl  group  is  known  to  be 
transferred  intact  from  L-methionine  (Zundel  & 
Rohmer  1985).  Genomes  of  sequenced  cyanobacteria 
and  M.  capsulatus  (which  produces  3Me-BHP)  reveal 
that  they  contain  homologues  of  sterol-methyltransfer- 
ases  found  in  plants.  The  transfer  of  a  methyl  group  to 
the  hopanoid  ring  structure  is  presumably  preceded  by 
desaturation  at  the  2-3  position.  Aerobic  sterol 
desaturases  are  present  in  the  genomes  of  several 
cyanobacteria,  with  unknown  function.  It  has  yet  to  be 
demonstrated  that  either  sterol-methyltransferases  or 
sterol  desaturases  are  involved  in  the  methylation  of  the 
hopanoid  A-ring,  but  their  potential  should  be 
investigated. 

(ii)  Alternative  oxidants  on  the  early  Earth 
Oxidizing  power  may  have  been  scarce  on  the  Earth’s 
surface  before  the  oxygenation  of  the  atmosphere. 
Under  such  conditions,  it  is  possible  that  cellular  life 
made  use  of  oxidants  that  have  since  been  supplanted 
by  nearly  ubiquitous  O2.  Postulated  scenarios  for  the 
use  of  such  alternative  oxidants  should  uphold  criteria 
of  geochemical  and  biochemical  viability;  nitrogen 
oxides  fail  the  first  test,  while  water,  as  discussed 
above  for  SQMO,  generally  fails  the  second.  One 
feasible  alternative  is  hydrogen  peroxide.  Significant 
H2O2  is  generated  by  the  reaction  of  water  with 
pyrite  under  anaerobic,  UV-illuminated  conditions 
(Blankenship  &  Hartman  1998;  Borda  et  al.  2003),  a 
plausible  scenario  on  the  early  Earth.  The  use  of 
peroxide  as  a  ‘transitional’  redox  partner  (both  as  an 
oxidant  and  as  a  reductant)  in  biogeochemical 
evolution  has  been  discussed  previously  (Kasting  et  al. 
1985;  Blankenship  &  Hartman  1998;  Borda  et  al. 
2003).  Hydrogen  peroxide  may  have  been  a  suitable 
oxidant  for  biosynthetic  oxygenation  reactions,  such  as 
those  described  above  in  sterol  synthesis.  Such  a 
scenario  has,  at  minimum,  three  prerequisites  to  be 
fulfilled:  (i)  H2O2  must  have  been  produced  in 
geochemically  significant  quantities  and  have  been 
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available  to  micro-organisms  in  a  variety  of  habitats, 
(ii)  The  enzymes  of  the  pathway  of  interest  must  be 
able  to  use  H2O2  as  an  oxidant,  and  themselves  be 
stable  in  concentrations  of  peroxide  thought  likely  to 
arise,  (iii)  Cells  must  be  able  to  use  exogenous  peroxide 
anabolically  to  produce  the  biochemicals  at  question. 
Even  if  only  the  first  condition  can  be  demonstrated, 
further  chemistry  (such  as  the  iron-catalysed  Haber- 
Weiss  reaction)  should  be  considered  as  sources  for 
redox  partners  for  metabolism.  Exploration  of  these 
possibilities  will  likely  lead  to  new  insights  into  the 
coevolution  of  the  redox  chemistry  of  the  geosphere 
and  biosphere,  and  highlight  the  importance  of  under¬ 
standing  the  evolutionary  biochemistry  of  reactive 
oxygen  species. 

5.  SUMMARY  AND  FUTURE  DIRECTIONS 

The  sequences  of  key  enzymes  in  steroid  biosynthesis 
are  very  highly  conserved  within  the  eukaryotic  domain 
and  it  appears  likely  that  the  initial  steps  of  the  pathway 
were  present  in  their  modern  form  in  the  last  common 
ancestor  of  eukaryotes. 

Steroid  biosynthesis  is  an  oxygen  intensive  process 
with,  for  example,  1 1  molecules  of  O2  being  required 
for  the  synthesis  of  one  molecule  of  cholesterol.  It  is 
also  energetically  expensive.  While  one  can  postulate 
anaerobic  alternatives  to  some  steps  in  the  pathway, 
these  would  be  even  more  energy  intensive.  Any 
postulate  for  an  ancestral  anaerobic  pathway  to  sterols 
must  explain  the  replacement  of  five  enzymes  with 
anaerobic  equivalents  and,  further,  that  all  of  these 
have  been  lost  or  are  unknown. 

Previous  reports  of  sterol  biosynthesis  in  cyanobac¬ 
teria  appear  to  be  erroneous.  It  seems  that  cyanobac- 
terial  cultures  are  easily  contaminated  by  fungi  related 
to  rusts.  The  sterol  biosynthetic  capability  of  other 
Bacteria  is  patchily  distributed,  characterized  by  path¬ 
ways  that  are  either  anomalous  or  incomplete,  and 
likely  gained  from  Eukarya  by  lateral  gene  transfer. 

The  generally  accepted  hypothesis  that  hopanoids 
are  sterol  surrogates  in  bacteria  deserves  re-visiting 
with  investigations  of  the  localization  and  functional 
roles  of  BHP.  In  particular,  studies  of  the  phylogenetic 
distribution,  biosynthesis  and  functional  role  of  2Me- 
BHP  in  cyanobacteria  would  be  particularly  valuable. 
As  a  starting  point,  one  could  hypothesize  that  the 
biosynthesis  of  2Me-BHP  in  cyanobacteria  involves  an 
oxygen-dependent  desaturase  and  a  methyltransferase 
analogous  to  those  employed  in  sterol  biosynthesis. 

Understanding  early  steps  in  cellular  evolution  will 
be  aided  by  more  detailed  studies  of  hydrocarbons  in 
Archaean  and  Proterozoic  rocks.  Further  studies  of  the 
membrane  function  of  sterols  and  their  role  in 
evolutionary  and  ecological  adaptation  will  also  be 
valuable. 
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